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I.  INTRODUCTION 


There  is  a  great  deal  of  interest  in  the  improvement  of 
program  and  system  development  efficiency*  primarily  because 
software  costs  have  risen  dramatically  in  recent  years  as  a 
fraction  of  total  system  development  costs*  One  approacn  to 
the  improvement  of  efficiency  is  the  provision  of  an 
enhanced  set  of  interactive  program  development  tools  for 
the  programmer  and  the  increased  automation  of  program 
development.  Many  such  efforts  involve  the  notion  of  a 
"programming  envi ronment " *  that  is*  an  interactive 
environment  in  which  a  wide  selection  of  software  tools  is 
provided  as  an  integrated  package*  with  a  consistent  and 
relatively  concise  command  structure.  Typically*  a  means  is 
provided  to  allow  the  programmer  to  work  within  the  language 
being  used  for  the  program*  without  having  to  descend  to  the 
object  language  level  to  perform  any  of  the  functions 
necessary  to  create*  modify*  or  test  the  program. 

As  a  concrete  example*  the  reader's  attention  is  drawn 
to  the  most  wigelyknown  integrated  programming  environment* 
the  APL  system  (Iverson*  1962]  .  When  using  this  system*  the 
programmer  is  able  to  perform  all  steps  in  the  program 
development  process  without  ever  having  to  issue  explicit 
commands  to  the  host  operating  system.  The  APL  environment 
itself  provides  an  integrated  set  of  facilities  for  storing* 
editing*  and  debugging  modules  which  are  arranged  in 
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workspaces  and  libraries/  access  to  which  is  available  using 
commands  that  are  oart  of  the  APL  language  aetinition 
itself.  in  addition/  so  far  as  the  user  is  concerned/  there 
is  no  notion  of  translating/  linking/  or  loading  individual 
functions  or  programs.  To  the  programmer  the  system  appears 
to  be  capable  of  evaluating  programs  written  in  APL  wi tnout 
translation/  and  all  of  the  programmer's  interactions  with 
the  APl  programs  defined  occur  within  the  syntactic 
framework  of  the  original  source  language. 

Other  1 anguage-or i ented  programming  environments  are 
under  development  or  in  use/  notably  the  6.CL  project  at 
Harvard  IWegbreit  et .  at./  t9741 ,  which  is  based  on  a  LISP- 
1  i  ice  Drogramming  language/  and  the  GANDALF  project/ 
(Habermann/ 1 9791 /  which  is  based  on  the  new  Department  of 
Defense  language/  AOA,  Both  of  these  projects  are  designed 
to  offer  an  environment  which  is  even  more  intensively 
syntax-oriented  than  that  offered  by  APL.  In  addition/ 
these  systems  incorporate  into  an  integrated  environment  a 
wide  ranqe  of  facilities  normally  provided  by  the  host 
operating  system.  The  two  human  engineering  ideas 
motivating  the  design  of  such  systems  are  to  free  the 
programmer  from  the  necessity  of  learning  two  command 
structures/  and  the  ability  to  reference  and  access  parts  of 
the  modules  being  developed  using  the  natural  structure 
imposed  by  the  syntax  of  the  language  in  which  they  are 
wri tten. 
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One  of  the  crucial  problems  which  must  be  solved  in 


implementing  such  an  environment  is  the  need  to  provide  more 
or  less  continual  access  to  the  evaluable  program  structure 
in  a  syntax-oriented  fashion.  Conceptually*  the  system  must 
"understand"  tne  syntactical  structure  of  the  program  auring 
its  entire  existence*  not  simply  during  the  phase  in  which 
it  is  entered  into  the  system.  Thus*  the  internal  structure 
of  the  program  must  be  sufficiently  complex  to  reflect  the 
syntax  of  the  program  at  all  times*  and  facilities  to 
utilize  this  structure  must  be  on-line  during  the  entire 
period  of  program  development.  Since  such  a  requirement 
must  be  met  for  other  reasons*  a  syntax-directed  editor  is 
often  offered  as  the  orimary  means  of  program  entry.  Such 
an  editor  utilizes  the  on-line  knowledge  of  program 
structure  to  allow  additions*  deletions*  and  modifications 
of  the  program  structure  to  be  made  based  on  the  natural 
syntactical  units  of  the  program*  rather  than  the  more  usual 
line-oriented  approach. 

Our  research  was  originally  motivated  by  this 
application  tor  syntax-directed  editing*  since  the  program 
access  algorithms  for  the  editor  are  the  very  routines 
involved  in  program  structure  access  throughout  its  life  in 
the  programming  environment.  fie  wished  to  investigate  the 
task  of  generating  a  syntax-di rected  editor  from  a  grammar 
description*  in  the  hopes  that  procedures  tor  routinely 
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performing  such  a  task  could  De  described  in  general  terms* 
if  not  altogether  automated.  Ihe  Delief  that  a  set  of 
usable  rules  could  De  found  was  encouraged  oy  tne  fact  that 
techniques  for  generating  a  functionally  analogous  system*  a 
parser*  from  a  t3Nh  grammar  description  are  wel 1 -understood 
and*  in  fact*  frequently  automated. 

Ihe  techniques  reported  in  this  paper  are  fundamentally 
very  simple*  but  lie  in  a  direction  diametrically  opposed  to 
those  involved  in  parser  generation.  A  parser  is  a 
mechanism  for  taxing  a  correct  word  in  some  language*  and 
recreating  tne  syntactical  structure  inherent  in  that  word 
from  the  grammar  of  the  language.  Tnat  this  structure  can 
be  deduced  from  what  would  otherwise  be  a  meaningless  string 
of  symbols  is  a  consequence  of  the  fact  that  the  programmer 
used  a  grammar  to  create  it  that  was  equivalent  to  that  used 
oy  the  creator  of  the  parser.  The  program  itself  represents 
a  sequent i a  1 i zed  version  of  parallel*  hierarchical 
structures*  one  in  the  mine  of  the  programmer*  and  the  other 
internal  to  the  computer  system.  The  programmer  has  encoded 
the  structure  into  the  message*  and  the  parser  is  the 
mechanism  needed  to  decode  it. 

Viewed  in  this  light*  the  use  of  a  parser-based 
translation  system  is  a  very  odd  solution  indeed  to  the 
problem  of  entering  a  program  structure  into  a  computer 
system  for  subsequent  execution:  it  is  as  if  a  piano  were 
were  to  be  moved  it  into  a  house  by  tearing  it  into  small 
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pieces*  aoproori ate) y  label  ling  each  one*  pushing  the  pieces 
through  a  mail  slot*  and  relying  on  an  automaton  inside  the 


house  to  reassemble  the  piano.  This  orocedure  is 
notoriously  error-orone*  and  once  accomplished,  it  is 
extremely  difficult  for  the  programmer  to  gain  access  in  a 
human-or i ented  w ay  to  the  actual  structure  built.  Extending 
the  simile  used  above*  it  is  as  if  we  could  only  confirm 
that  the  piano  had  been  reconst  rue t ed  properly  by  listening 
to  the  music  emanating  from  the  interior  of  the  house  after 
the  piano  had  been  reassembled! 

Of  course*  the  historical  cause  for  such  a  solution  is 
clear:  most  genera  I -purpose  computing  systems*  at  the  time 
language  translation  technology  was  elaborated*  relied 
heavily  on  sequential*  batch-oriented  input  mechanisms  such 
as  card  readers*  and  were  like  houses  without  front  doors* 
only  mail  slots.  There  was  a  driving  need  to  invent  such 
mechanisms  as  parsers  so  that  high-level  programming  could 
oe  done  at  all. 

However*  with  the  increased  reliance  on  interactive* 
remote-entry  time-sharing  facilities*  a  radically  different 
solution  to  the  problem  of  program  entry  can  be 
investigated.  The  program  structure  can  be  interactively 
built  within  the  computer  in  the  first  place.  Such  a 
solution  obviates  the  need  for  a  parser  altogether. 
Instead*  the  editor  and  the  programmer  cooperate  to  build 
the  desired  structure  directly.  The  grammatical 


spec i f i cat i ons  of  the  language  are  not  used  indirectly*  to 
Build  a  decoder  for  an  unnecessary  representation*  out  are 
used  simply  as  data  to  guide  an  appropriate*  direct 
synthesis  of  a  well -st ructured  program  reoresentat ion. 

This  thesis  describes  such  mechanisms  in  enough  detail 
to  serve  as  the  oasis  for  the  implementation  of  a  language 
independent  program  entry  system.  The  system  is  language 
inaependent  in  the  sense  that  data  corresoondi ng  very 
closely  to  the  grammar  of  a  context-free  language  itself*  in 
the  form  of  a  finite  set  of  static  "transformations"*  is 
directly  interpreted  by  the  system  to  form  structures  veil* 
formed  under  that  grammar.  If  the  grammar  data  is  changed* 
the  same  system  supports  a  new  language. 

we  have  adopted  the  term  “grammar-driven  syntnesis"  to 
describe  the  function  of  the  systems  discussed  in  this 
paper*  in  order  to  suggest  the  idea  that  grammars  with  a 
rich  set  of  operators  are  utilized  as  knowledge  Oases  with 
little  or  no  pre-processing.  This  direct  utilization  of  a 
human-oriented  grammar  is  to  be  contrasted*  for  instance* 
with  the  extensive  pre-processi ng  required  to  derive 
transition  tables  for  driving  a  shift-reduce  parser. 

Chapter  II  describes  in  very  general  terms  several  basic 
mechanisms  for  performing  such  grammar-driven  syntnesis* 
relating  them  to  the  fundamental  idea  of  performing  a  valid 
derivation  under  a  context-free  grammar.  Chapter  III 
provides  a  further  elaboration  of  these  mechanisms*  aimed 
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toward  the  more  concrete  goal  of  oeing  able  not  only  to 
create/  but  also  to  modify  ana  delete  parts  of  a 
hierarchical  program  structure#  in  a  syntactically 
consistent  way.  Chapter  IV#  which  is  something  of  a 
digression#  considers  from  the  viewpoint  of  database  design 
how  programs  may  be  represented  ana  accessed  as  databases 
during  modification  and  during  storage  or  transmission  from 
one  place  or  time  to  another.  In  Chapter  V,  a  conceptual 
description  is  presented  of  a  prototype  programming 
environment#  designed  to  allow  the  programming  language  in 
use  to  be  changed  by  simply  changing  the  language 
description  installed  in  the  system.  This  design  is 
concerned  solely  with  the  facilities  for  program 
modification  and  entry#  and  is  based  on  the  assumption  that 
a  means  for  describing  in  a  relatively  simole  way  tne 
semantic  content  of  the  program  structures  to  oe  ouilt  can 
be  found.  Finally#  in  Chapter  VI#  the  results  of  the 
research  undertaken  so  far  are  summarized#  and  some 


suggestions  for  future  investigations  are  made 


II.  GRAMMAR-DRIVEN  SYNTHESIS 


A.  INTRODUCTION 

In  this  chaoter#  several  models  Tor  grammar-driven 
editors  of  increasing  comolexity  are  described  i  r»  terms  of 
the  theory  of  context-free  orammars.  Each  editor  receives 
two  sequences  of  input  symbols#  the  first  representing  a 
context-free  grammar#  and  the  second  a  series  of  commands 
which  guides  the  synthesis  of  a  sentential  form  of  the 
grammar  initially  provided.  The  described  mechanisms  are 
capable  of  utilizing  very  general  classes  of  context-free 
grammars#  including  ambiguous  and  incomplete  grammars  as 
well  as  grammars  with  useless  productions  Ci.e.#  productions 
which  do  not  occur  in  the  derivation  sequence  for  any  word 
of  the  defined  language.)  For  this  reason#  we  adoot  the  view 
that  the  fundamental  product  produced  bv  such  a  synthesizer 
is  a  sentential  form#  oossibly  containinq  non-terminal  as 
well  as  terminal  symbols. 

The  first  syntax-directed  editor  produced  by  the 
research  group  along  the  lines  outlined  in  this  section  was 
written  by  B.  MacLennan  in  November#  1980  in  LISP  and  called 
"A  Universal  Sy n t ax-D i r ec t ed  Editor".  The  primary  motiva¬ 
tion  for  the  analysis  of  grammar-dr i ven  synthesis  presented 
in  this  chapter  was  to  perform  an  exhaustive  review  of  the 
algorithms  employed  and  to  connect  them  to  the  mathematical 
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theory  of  context-free  grammars  in  such  a  way  as  to  justify 
the  adjective  "universal"#  as  well  as  to  provige  reasonably 
convincing  informal  arguments  that  no  critical  loopholes  had 
been  missed.  This  technology  for  using  a  qrammar  is  com¬ 
pared  with  conventional  parsing  techniques#  and  the  feasi¬ 
bility  of  using  such  synthesizers  as  the  foundation  of  a 
system  providing  interactive  access  to  a  Hierarchically 
organized  database  (such  as  that  representing  an  executable 
program  structure)  is  discussed. 

B.  GRAMMARS  ANO  SENTENTIAL  FORMS 

It  is  assumed  that  the  reader  is  familiar  with  the 
Backus-Naur  Form#  or  BNF#  notation  for  mathematical  gram¬ 
mars.  Appendix  A  contains  a  formal  specification  for  this 
notational  system.  The  basic  concepts  from  the  theory  of 
context-free  grammars  used  throughout  this  section  are 
adapted  from  (Hopcroft  and  Oilman#  1979).  The  present  sec¬ 
tion  is  provided  primarily  for  background  and  continuity. 

A  context-free  grammar  has  the  following  elements: 

--  A  finite  set  T  of  terminal  symbols# 

--  A  finite  set  N  of  non-terminal  symbols# 
disjoint  from  T# 

--  A  finite  set  P  of  productions#  each  expressed 
in  BNF  notation# 

--  A  designated  target  non-terminal  t 
included  in  N. 
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In  addition*  for*  the  grammar  to  be  context-free*  every  ero 


duction  must  be  of  the  form 

<  a  >  :  :=  x, 

where  X  is  a  string  (possibly  empty)  of  terminal  and  non¬ 
terminal  symbols*  and  a  is  a  non-terminal  symbol.  The  acro¬ 
nym  "CFG"  is  commonly  used  to  aboreviate  the  phrase 
"cont ex t -f ree  grammar".  Throughout  this  chapter*  we  will 
adopt  the  convention  of  using  lower-case  letters  from  the 
beginning  of  the  alphabet  to  represent  non-terminal  symbols* 
lower-case  letters  from  the  end  of  the  alphabet  to  represent 
terminal  symbols*  and  upper  case  letters  to  represent 
strings  (possibly  empty)  of  terminals  and  non-terminals. 
Since  we  will  be  considering  only  context-free  grammars*  the 
term  "grammar"  will  always  be  understood  to  mean  "context- 
free  grammar".  We  shall  also  assume  that  all  grammars  con¬ 
sidered  are  non-trivial*  that  is*  that  the  sets  T  and  P  are 
non-empty . 

1  .  Sentential  forms. 

The  basic  intuitive  concept  underlying  the  idea  of  a 
context-free  grammar  is  the  notion  of  derivation;  the 
replacement  in  a  string  of  a  single  non-terminal  symbol  by 
an  equivalent  string  of  terminals  and  non-termi nal s  as 
specified  by  some  production. 

Let  tt  s  I  T*  N*  P*  t  )  be  a  grammar*  and  let  S(l) 
and  S(2)  be  strings  of  symbols.  (We  adopt  the  notational 
convenience  of  using  parenthesized  integers  to  subscript 


variable  names.)  Then  we  say  SCI)  derives  SC2)  in  one  steo. 
if  SCI)  and  SC2)  have  the  form 

SCI)  =  XaZ f  S( 2 )  =  XYZ/ 

and  there  exists  a  production  in  the  set  P  with  the  form 

<  a  >  :  :=  Y. 

In  this  case/  we  write 

SCI)  =>  SC2). 

In  an  analogous  fashion/  we  may  define  the  notion  of 
a  leftmost  derivation/  for  which  the  string  x  above  contains 
no  non-terminal  symbols. 

A  strinq  S  is  said  to  derive  a  string  S'  in  zero  or 
more  steps/  or  simply  derive  a  string  S'/  if  one  of  the  fol¬ 
lowing  conditions  is  true:  either  S  =  S'/  or  else  there 
exists  a  series  of  strings  SCI)/  SC2)/  .  .  .  /  Sin)  such 
that  S  =>  SCI)/  SCI)  =>  SC2),  .  .  ./  S(n)  =>  S’.  In  this 
case,  we  write 

S  *=>  S' . 

A  string  W  is  said  to  be  a  sentential  form  of  G  if 
t  *s>  M/  where  t  is  the  target  symbol  of  G.  A  sentential 
form  with  no  non-terminaJ  symbols  is  called  a  word.  The  set 
of  all  such  words  is  called  the  language  defined  by  G.  Such 
a  language  is  called  a  context-free  language/  or  "CFL". 

A  grammar  is  said  to  be  ambiguous  if  there  exists  a 
word  in  the  language  defined  by  the  grammar  with  two  or  more 
distinct  leftmost  derivations.  There  exist  languages 
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defined  by  a  context-free  grammar  that  are  inherently  ambi¬ 


guous:  that  is#  which  cannot  be  defined  by  an  unambiguous 

context-free  grammar. 

2 .  ARGOT  notation. 

ahile  BNF  notation  is  convenient  for  theoretical 
manipulations  because  it  incorporates  a  single  underlying 
idea#  that  of  replacement  in  accordance  with  a  production#  a 
more  powerful  notation  for  practical  specification  of 
languages  is  desirable. 

For  our  purposes#  we  wit!  adapt  a  system  of  notation 
called  ARGOT  notation#  with  a  concise  yet  powerful  set  of 
replacement  operators  reminiscent  of  the  operators  used  in 
the  theory  of  regular  expressions.  This  notation  was 
developed  as  the  core  of  a  pattern-matching  programming 
language  called  ARGOT  (MacLennan  1R751  ,  In  fact#  we  will 
use  a  restricted  version  of  this  notation#  but  it  is 
convenient  to  introduce  the  full  notation  first  ana  then 
restrict  it  as  required.  A  formal  description  of  ARGOT 
notation  is  provided  in  Appendix  A. 

a.  Rules  and  ARGOT  expressions. 

In  place  of  a  set  of  productions#  ARGOT  uses  a 
list  of  named  rules#  each  of  the  form: 

name:  expression. 

Rule  names  perform  the  same  role  in  ARGOT  notation  as  non¬ 
terminal  symbols  in  BNF  notation#  however#  it  is  required 
that  each  rule  have  a  unique  rule  name. 


Terminal  symbols  or  strings  are  denoted  by 
under l i n i ni ng,  use  of  boldface  tyne»  or  enclosure  ov  Quote 
marks  (")»  whichever  is  appropriate  for  the  typeface  avail¬ 
able. 

The  colon  corresponds  to  the  BNF  metasymbol 
"::="#  separating  the  rule  name  from  the  expression  denoting 
how  an  occurrence  of  that  rule  name  may  be  expanded.  Rules 
are  terminated  by  periods  to  separate  rules  unambiguously. 

The  expression  half  of  a  rule  is  an  indefinitely 
deep  hierarchy  of  elementary  replacement  operations  and 
sub-expressions#  eventually  terminating  on  the  deepest  lev¬ 
els  with  terminal  strings  or  rule  names.  Each  operator 
allows  a  specific  replacement  operation#  which  may  oe 
thought  of  as  being  applied  from  the  shallowest  level  of  the 
hierarchy  downward  in  a  non-aetermi ni st i c  fashion.  Thus#  a 
single  ARGOT  rule  corresponds  to  a  number  of  eauivalent  BNF 
product i ons. 

b.  Concatenation 

The  simplest  replacement  operator  is  that  of 
concatenation#  or  replacement  of  a  single  construct  by  a 
series  of  sub-constructs.  The  concatenation  operator  is 
denoted  by  simple  juxtaposition.  Concatenated  expressions 
may  be  grouped  into  a  single  construct  and  used  as  a  sub¬ 
expression  by  means  of  parentheses.  A  single  BNF  production 
expresses  the  same  idea  as  a  simple  ARGOT  concatenat ion 
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(except  that  in  ARGOT  an  "empty"  rule  cannot  occur).  Thus# 
the  BNF  production 

<proaram>  ::  =  program  <identifier>  <block>  . 
is  equivalent  to  the  ARGOT  rule 

program:  "program"  identifier  block  "."  . 

The  occurrence  of  a  rule  name  means  that  that  position  in 
the  sequence  is  to  be  expanded  as  defined  by  the  named  rule# 
while  the  occurrence  of  a  terminal  string  means  that  that 
position  in  the  sequence  is  to  be  filled  by  the  quoted 
string. 

c.  Optional  constructs. 

An  optional  sub-expression  is  surrounded  by 
brackets.  The  meaning  of  this  operator  is  that  at  the 
specified  point#  the  indicated  sub-expression  may  either  be 
placed  into  the  symbol  string  or  omitted.  Thus#  the  rule 

statement:  (  label  )  action. 

allows  replacement  of  "statement"  by  either  "label  action" 
or  by  "action". 

d.  Alternation  Operators. 

Two  alternation  operators  are  provided#  simole 
and  optional  alternation.  Simple  alternation  is  denoted  by 
means  of  a  list  of  sub-expressions  separated  by  vertical 
strokes  and  surrounded  by  curly  brackets.  The  construct  may 
be  expanded  by  choosing  one  of  the  sub-constructs  as  the 
replacement.  Thus#  by  the  rule 

digit:  ("0" ! " l " ! "2" > . 
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the  rule  name  "digit"  may  be  reolaced  by  any  one  of  "0", 
"1",  or  "2". 

The  ootional  alternation  construct  is  denoted  in 
the  same  way  as  a  simple  alternation*  except  that  square 
brackets  are  used  instead  of  curly  brackets.  This  operator 
allows  replacement  not  only  by  any  of  the  inaicated  alterna¬ 
tives*  but  also  bv  the  empty  string.  For  example*  the  rule: 

sign:  1  "♦•  1  1  . 

allows  the  rule  name  "sign"  to  be  replaced  by  "♦"*  dv  "-"* 
or  to  be  deleted  (replaced  by  the  empty  string), 
e.  Iteration  operators. 

Three  iteration  operators  are  provided.  The 
required  iteration*  or  simple  iteration*  is  denoted  dv  a 
plus  sign  followed  by  a  sub-expression.  This  construct 
allows  replacement  by  one  or  more  instances  of  the  sub¬ 
expression.  Thus,  the  rule 

integer:  tdigit. 

means  that  an  instance  of  "integer"  can  be  replaced  by 
"digit"*  by  "digit  digit"*  by  "digit  digit  digit"*  etc. 

Optional  iteration*  denoted  by  the  asterisk  fol¬ 
lowed  by  a  sub-expression*  implies  that  the  construct  can  be 
replaced  by  xero  or  more  instances  of  the  sub-exoress i on . 
Thus*  the  rule 

astring:  *"a". 

allows  expansion  of  the  rule  name  "astring"  to  the  emoty 
string*  or  to  any  of  the  strings  "a"*  "aa".  "aaa"*  etc. 
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The  final  form  of  iteration#  list  iteration#  is 
denoted  bv  surroundinq  two  sub-expressions  with  a  sharp  sign 
on  the  left  and  three  oeriods  on  the  riant.  It  allows 
replacement  by  one  or  more  instances  of  the  first  sub¬ 
expression#  separated  by  instances  of  the  second  sub¬ 
expression.  Thus#  the  rule 

list;  *  atom  "  # "  ...  . 

allows  replacement  of  the  rule  name  “list'*  by  "atom"#  "atom# 
atom"#  "atom#  atom#  atom"#  etc. 

f.  Properties  of  the  ARGOT  notation. 

The  most  important  feature  of  the  notation  is# 
that  although  it  is  richer  in  operators  and  in  this  sense 
more  expressive  than  BNF  notation#  it  is  not  more  powerful. 
A  1 anguaqe  is  context-free  if#  and  only  if#  it  is  expressi¬ 
ble  as  a  finite  set  of  ARGOT  rules.  This  can  be  shown  by 
reducing  ARGOT  to  BNF  notation#  that  is#  by  providing  algo¬ 
rithms  for  transforming  any  finite  set  of  context-free  BNF 
productions  to  an  equivalent  set  of  ARGOT  rules#  and  vice- 
versa.  This  constructive  proof  is  straightforward  and  unin¬ 
formative#  as  the  desired  transformations  are  fairly  evident 
on  an  intuitive  level. 

As  originally  defined#  the  complete  ARGOT  pro¬ 
gramming  language#  which  allows  syntactically-keyed  computa¬ 
tion  as  well  as  input  and  output  parameters  to  be  passed 
between  rules#  has  the  full  computational  power  of  the 
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lambda  calculus  (MacLennan  1975] 


The  notations!  suoset  we 


are  here  callinq  "ARGOT  notation"  does  not  have  the  full 
power  of  the  ARGOT  language  defined  in  this  reference. 

The  notation  can  also  be  reqarded  as  a  generali¬ 
zation  of  the  notion  of  a  regular  expression.  ae  *av  think 
of  a  set  of  ARGOT  rules  as  being  a  set  of  named  regular 
expressions.  and  then  allow  rules  to  refer  to  themselves 
directly  or  indirectly  to  achieve  the  power  of  a  context- 
free  grammar.  This  rotational  similarity  allows  the  simple 
statement  of  a  sufficient  (but  not  necessary)  condition  for 
the  regularity  of  an  ARGOT-def i ned  language.  If  a  finite 
set  of  ARGOT  rules  can  be  arranged  in  such  an  order  that  the 
right-hand  side  of  each  rule  refers  only  to  rules  occurring 
further  down  the  list.  the  1 anguaqe  defined  is  regular. 
That  this  is  so  can  be  seen  fairly  readily.  Such  an  order¬ 
ing  allows  replacement  of  each  rule  name  except  for  that  of 
the  target  by  the  right-hand  side  of  each  of  the  named  rules 
in  a  terminating  sequence.  The  resulting  single  rule  is 
simply  a  regular  expression  with  operators  and  terminal 
strings  alone  on  the  right-hand  side. 

This  result  is  of  practical  use.  since  if  we 
know  that  a  language  is  regular,  then  we  know  that  simple 
(non-recursi ve)  algorithms  exist  for  processing  it.  The 
algorithms  for  processing  it  are  considerably  less  compli¬ 
cated  than  if  the  language  is  context-free  but  not  regular, 
in  which  case  some  sort  of  recursive  mechanism  is  required. 
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3 .  Restricted  ARGOT  notation  (R-ARGQT). 

The  full  ARGOT  notation,  as  described,  has  more 
expressive  power  than  required  for  the  application  we  are 
interested  in,  for  two  reasons: 

--  its  indefinitely  nested  structure  re au ires  recursive 
routines  to  access  the  sub-expressions  in  a  rule,  and 
--  highly  nested  expressions  are  too  complicated  to  ex¬ 
press  easily-learned  syntax  units  for  the  user. 

That  the  notation  allows  indefinite  nesting  is  implied  Dy 
the  fact  that  the  notation  itself  is  an  inherently  context- 
free  language.  Since  we  shall  be  accessing  the  grammatical 
descriptions  of  languages  as  databases,  it  is  highly  desir¬ 
able  to  be  able  to  describe  and  encode  simole,  efficient 
access  routines.  In  addition,  a  simpler  notation  will  allow 
us  to  conceptualize  a  given  grammar  as  consisting  of  a  col¬ 
lection  of  rules  each  of  which  is  formatted  in  one  of  a  fin¬ 
ite  number  of  ways. 

what  we  would  like  is  a  notation  that  is  expressible 
as  a  regular  expression  (as  is  8NF  notation)  so  that  it  is 
easily  processed,  but  retains  an  adequate  amount  of  expres¬ 
sive  power.  These  goals  are  met  by  appropriately  restrict- 
i ng  the  nesting  allowed  within  ARGOT  expressions.  The 
resulting  notation  is  called  R-ARGOT  notation  (for  either 
restricted  or  regular  ARGOT). 

The  set  of  available  operators  is  restricted  to  con¬ 
catenation,  required  iteration,  simple  alternation,  list 
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iteration#  and  the  optional  ooerator 


The  other  operators 


are  rendered  superfluous  Dy  the  nesting  restriction. 

R-ARGOT  expressions  (rule  right-hand  sides}  may  oe 
simple  or  complex.  A  simple  expression  is  a  concatenation 
of  one  or  more  terminal  strinos#  rule  names#  or  ootional 
rule  names.  A  complex  expression  is  an  alternation# 
required  iteration#  or  list  iteration.  Any  sub-expression 
in  an  alternation  or  iteration  must  oe  a  rule-name.  Trie 
first  sub-expression  in  a  list  operation  must  be  a  rule- 
name.  The  second  may  be  either  a  rule-name  or  terminal 
string. 

The  effect  of  these  rules  is  to  limit  the  number  of 
possible  formats  available  for  the  grammar  designer  to  a 
small  set.  Alternations  and  simple  iteration  operators  will 
always  be  the  topmost  operator  in  a  given  rule  expression  if 
they  occur  at  all#  and  the  operands  will  be  simple  rule- 
names  in  such  expressions.  The  list  iteration  operator  must 
also  be  topmost#  and  only  the  second  operand  may  be  other 
than  a  rule-name#  and  if  so#  must  oe  a  single  terminal 
string.  Only  if  the  concatenation  operator  is  topmost  may 
the  operands  be  alternations#  and  even  in  this  case  no 
further  operators  are  allowed  in  the  rule. 

It  is  something  of  a  surprise  that  such  stringent 
restrictions  result  in  grammars  that  are  reasonably  well- 
oriented  toward  human  comprehension.  The  rules  that  result# 
when  they  are  read  informally#  seem  to  express  natural 
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syntactic  units.  It  must  o«  admitted  that  an  imorovement  in 
human  comprehens i b i li t v  might  be  attained  by  allowing  one 
level  of  nesting.  However,  the  simplifications  in  the 
rule-access  algorithms  provided  oy  naming  each  suo- 
expression  are  so  striking  we  have  been  led  to  retain  k- 
aRGOT  as  described  here. 

The  languages  defined  in  Appendices  A  and  d  are 
defined  using  the  R-ARGOT  notation.  In  particular,  the 
reader 's  attention  is  drawn  to  Appendix  b»  which  contains  a 
grammar  for  the  PASCAL  programming  language.  Most  of  the 
syntactic  rules  can  be  seen  to  correspond  to  natural  Syntac¬ 
tic  constructs  within  the  language  in  a  way  that  BNF  produc¬ 
tions  do  not. 

One  irritation  encountered  in  the  use  of  R-ARGQT  is 
the  implicit  requirement  to  rename  terminal  strings  which 
carry  semantic  information  (that  is.  that  occur  as  alterna¬ 
tives  within  an  alternation).  Where  we  would  like  to  write, 
for  instance,  rules  such  as 

string:  +  character. 

character:  (  "a"  I  "bM  !  .  ,  .  !  >. 
we  must  instead  write 


2b 


string:  *  character, 
character:  (  a  !  b  !  .  .  .  .*  z  1. 
a:  “a" . 
b :  "b" . 

•  •  • 

z :  " z" . 

To  avoid  the  necessity  to  orovide  a  large  number  of  trivial 
rules  renaming  tokens#  we  shall  assume  the  existence  of  a 
facility  in  the  system  for  escaping  from  the  normal  mode  of 
grammar-dr i ven  synthesis  to  predefined  lexical  synthesizers. 
Such  a  facility  is  analogous  to  the  separation  of  the 
analysis  task  between  the  parser  and  scanner  in  a  conven¬ 
tional  compiler.  Thus#  we  will  assume  that  predefined  rules 
exist  with  such  names  as  "identifier"#  "integer",  "string", 
etc.  In  the  system  to  be  implemented,  these  rule  names 
correspond  to  predefined  input  scanners  and  parsers  avail¬ 
able  to  the  language  implementer. 

C.  A  SIMPLE  GRAMMAR-DRIVEN  STRING  EDITOR 

In  this  section,  a  simple  mechanism  is  described  capable 
of  generating  sentential  forms  from  an  input  grammar  in  3NF 
notation.  This  mechanism  serves  as  the  fundamental  model 
for  grammar-driven  editing  using  interactive  production 
selection  to  direct  the  course  of  the  synthesis. 


1 .  The  Basic  Mechanism. 

We  may  think  of  the  basic  mechanism,  which  « i  l  1  oe 
hereafter  referred  to  as  a  Grammar-Or 1 ven  String  Editor 
(GDSE),  as  a  multitape  Turing  Machine  with  two  input  tapes, 
labeled  PHASEI  INPUT  and  PHASER  INPUT,  four  internal  tapes 
labeled  GRAMMAR,  BUFFER,  CURSOR,  and  PRODUCTION,  and  an  out¬ 
put  tape  labeled  OUTPUT.  The  PHASEi  INPUT  tape  contains  a 
context-free  BNF  grammar,  which  is  storea  internally  on  the 
GRAMMAR  tape.  The  PHASE?  INPUT  taoe  contains  a  series  of 
editing  commands  which  will  be  more  fully  described  shortly. 
The  BUFFER  tape  is  used  as  a  work  area  to  synthesize  a  sen¬ 
tential  form.  The  CURSOR  and  PRODUCTION  tapes  are  used  to 
hold  indefinitely  large  integers  which  number  the  non¬ 
terminal  in  the  BUFFER  currently  being  expanded,  and  the 
production  being  applied  from  the  GRAMMAR  tape,  respec¬ 
tively.  The  OUTPUT  tape  is  provided  simply  as  a  conceptual 
convenience:  it  is  used  to  model  the  transfer  of  the  final 

form  produced  to  secondary  storage. 

The  operation  of  the  mechanism  is  as  follows: 

a.  Phase  One  --  Copy  and  Check  Grammar. 

The  PHASEI  INPUT  tape  is  copied  onto  the  GRAMMAR 
tape.  As  this  is  done,  the  contents  of  the  input  tape  are 
parsed  in  accordance  with  the  grammar  listed  in  Appendix  A 
for  8NF  notation.  Since  this  gra  r  is  regular,  the  input 
tape  can  be  rejected  or  accepted  as  a  legitimate  context- 
free  grammar  in  a  finite  number  of  steps.  Without  loss  of 
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generality^  we  assume  that  the  first  production  names  the 
target  symbol  as  its  left-hand  side. 

b.  Phase  Two  --  Initialization. 

In  phase  two/  the  mechanism  is  used  to  generate 
sentential  forms  via  valid  derivation  steps  on  the  BUFFER 
tape.  First/  the  target  non-terminal  is  copied  from  the 
first  production  onto  the  BUFFER  tape.  Then  the  following 
loop  is  executed.  Each  cycle  corresponds  to  one  step  of  a 
valid  derivation. 

c.  Phase  Two  --  Loop. 

A  symbol  is  read  from  the  PHASE2  INPUT  tape.  If 
it  is  ‘Q*  Cvfor  'Quit')/  control  is  passed  to  the  next  step 
beyond  the  loop. 

If  the  order  to  quit  is  not  received.  two 
integers  are  copied  from  the  PHASE2  INPUT  tape.  These 
integers  are  assumed  to  encode  the  relative  position  in  the 
buffer  of  the  next  non-terminal  to  be  replaced,  and  the  pro¬ 
duction  in  the  grammar  to  be  used  to  replace  it.  both  of 
the  integers  must  be  checked  to  oe  sure  that  they  refer  to  a 
real  non-terminal  in  the  BUFFER  and  to  a  real  production  in 
the  GRAMMAR.  If  they  do.  the  left-hand  side  of  the  selected 
production  is  checked  to  make  sure  it  is  the  same  as  the 
selected  non-terminal.  If  any  of  these  checks  fail,  the 
integers  are  simply  ignored  and  the  loop  re-entered  from  the 
beginning.  Otherwise.  the  indicated  replacement  is  per¬ 


formed.  In  detail,  the  mechanism  performs  the  following 


steps 


First,  an  integer  (suitably  encoded)  is  read 
from  PHASE2  INPUT  and  placed  in  the  CURSOR  register.  Sup¬ 
pose  this  integer  is  N.  The  N'th  non-terminal  symool  on  tne 
BUFFER  tape  is  located.  If  there  is  none,  control  is 
returned  to  the  top  of  the  loop. 

Another  integer  is  then  read  from  PHASER  INPUT 
and  copied  onto  the  PRODUCTION  tape.  Suppose  it  is  m.  The 
M'th  production  is  located:  if  there  is  none,  control  is 
returned  to  the  top  of  the  loop. 

The  heads  are  then  moved  to  the  N’th  non¬ 
terminal  on  the  BUFFER  tape,  and  the  left-hand  side  of  the 
M'th  production,  and  the  two  non-termi nal s  compared.  If 
they  are  not  the  same,  control  is  returned  to  the  top  of  the 
1  OOP. 


If  they  are  the  same,  the  right-hand  side  of  the 
M'th  production  is  used  to  replace  the  N'th  non-terminal  on 
the  BUFFER  tape,  moving  characters  to  the  right  to  make  room 
for  the  new  symbols  as  needed. 

Finally,  control  is  returned  to  the  top  of  the 

1  OOP. 


d.  Phase  2  --  End. 

The  BUFFER  tape  is  copied  to  OUTPUT  and  the 


machine  halts,  accepting 


e.  Synopsis 


The  algorithm  described  is  nothing  more  than  a 
restatement#  in  somewhat  more  detailed  terms#  of  the  funda¬ 
mental  method  for  producing  some  valid  sentential  form  under 
a  context-free  grammar.  Determinism  has  been  introduced  Dv 
using  an  additional  input  phase#  which  encodes#  as  the 
derivation  proceeds#  choices  for  the  next  non-terminal  to  be 
expanded  and  the  production  to  be  used.  Erroneous  input 
during  this  phase  is  ignored.  This  simple  mechanism  cap¬ 
tures  the  essential  flavor  of  grammar-or i ven  synthesis.  he 
may  note  that  the  contents  of  the  PHASE2  INPUT  taoe  may  oe 
obtained  in  sequence  when  they  are  needed#  and  are  never 
re-used.  Thus#  this  input  process  serves  as  an  entirely 
adeauate  model  for  an  interactive  process.  Throughout  the 
remainder  of  this  section#  we  will  assume  that  the  "Phase 
Two  User"  is  able  to  examine  the  internal  state  of  the 
machine  in  order  to  determine  the  current  state  of  the  syn¬ 
thesis  and  decide  what  to  do  next.  We  make  this  assumption 
to  avoid  cluttering  the  mechanism  descriptions  witn  output 
routines#  which  do  not  have  any  impact  on  the  current  state 
of  the  synthesis  in  any  event. 

2 .  Properties  of  the  GDSE. 

The  fundamental  property  possessed  by  the  UDSE  is 
that  it  never  contains  an  invalid  form  in  the  BUFFER#  and 
that  a  PHASE2  INPUT  string  exists  which  will  cause  the 
machine  to  halt#  accepting#  with  any  desired  sentential  form 


on  tne  OUTPUT  tape. 

In  one  sense*  these  assertions  are  hardly  suscepti¬ 
ble  to  a  convincing  proof*  since  the  mechanism  is  so  oovi- 
ously  related  to  the  notion  of  valid  derivation  in  the  first 
place  that  any  proof  is  likely  to  be  less  convincing  than 
this  intuition.  The  proof  can  be  carried  through  based  on 
an  induction  over  the  number  of  times  the  mechanism  passes 
through  the  loop.  Since  the  BUFFER  contains  a  valid  senten¬ 
tial  form  (the  target  symbol)  when  the  loop  is  entered  the 
first  time*  and  each  step  in  the  loop  either  leaves  tne 
BUFFER  unchanged  or  changes  one  valid  form  to  another  by 
expanding  a  single  non-terminal  in  accordance  with  a  produc¬ 
tion  in  tne  input  grammar*  the  BUFFER  contains  a  valid  sen¬ 
tential  form  whenever  the  loop  is  entered.  When  the  'U' 
symbol  is  read*  the  last  form  generated  is  placed  on  the 
OUTPUT  tape  prior  to  acceptance.  (The  machine  may  reject  if 
the  *Q'  symbol  is  missing). 

Given  a  desired  sentential  form*  there  exists  some 
valid  derivation  sequence*  starting  with  the  target  symool, 
such  that  each  derives  in  one  step  the  next*  and  the  last  is 
the  desired  form.  (There  may  be  more  than  one  such  sequence 
of  steps).  Each  step  consists  of  selection  of  a  non¬ 
terminal  in  the  last  derivation*  and  its  replacement  by  the 
right-hand  side  of  some  production.  Thus*  qiven  the  list  of 
derivation  steps*  it  is  easy  to  construct  a  list  of  pairs  of 
integers  for  the  PHASE2  INPUT  tape  which  will  recreate  these 


steps  in  the  BUFFER.  hence  for  any  sentential  form,  there 
exists  a  PHASE2  INPUT  taoe  which  will  cause  that  form  to 
appear  in  the  BUFFER.  Appending  a  'Q'  on  this  taoe  will 
cause  the  machine  to  halt,  acceotinq,  with  the  desired  form 
on  the  OUTPUT  tape. 

3.  Pi seussi on. 

As  previously  mentioned,  although  conceptually  sim¬ 
ple,  the  GOSE  is  the  underlying  model  for  all  of  our  more 
elaborate  grammar-dr i ven  mechanisms.  The  GD3E  plays  a  role 
for  grammar-driven  synthesizers  analogous  to  that  played  oy 
a  Deterministic  Push-Down  Automaton  (DPDA)  for  parser-based 
systems.  The  fundamental  simplicity  of  grammar-driven  syn¬ 
thesizers  arises  from  the  fact  that  this  underlying  mechan¬ 
ism  is  a  direct  restatement,  with  determinism  incorporated, 
of  the  very  notion  of  a  sequence  of  steps  in  a  valid  deriva¬ 
tion.  The  resulting  simplicity  is  to  be  contrasted  with  the 
much  more  complicated  "set  of  items"  construction  required 
to  generate  the  DPDA  associated  with  a  grammar,  which  causes 
the  relation  between  a  grammar  and  its  parser  to  be  very 
indirect  (Aho  and  Uliman  19773  .  The  GDSE  utilizes  the  graw 
mar  directly  to  synthesize  words,  rather  than  using  it 
indirectly  to  produce  a  derivative  mechanism  able  to  decode 
words. 

We  might  note  that  we  have  allowed  the  output  of  the 
GOSE  to  be  any  valid  sentential  form,  not  requiring  it  to  be 
composed  of  strictly  terminal  symbols.  In  other  words,  we 
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are  taking  as  the  fundamental  entity  defined  by  a  grammar?  a 
sentential  form  instead  of  a  word.  It  is  easy  enough  to  fix 
uo  the  mechanism  so  that  Before  halting?  it  checks  the 
string  in  the  SUFFER  for  non-t erm i na l s  and  accepts  only  if 
there  are  none.  Our  decision  not  to  do  so  is  based  on  tne 
philosophy  that  additional  restrictions  should  not  be  intro¬ 
duced  so  long  as  the  output  without  them  is  sensible.  In 
practical  terms?  a  valid  sentential  form  under  a  grammar  for 
a  programming  language  corresponds  to  a  Partially  complete? 
yet  wel 1 -structured  program?  with  the  missing  parts  labeled 
appropr i atel y  by  non-terminal  symbols.  In  fact?  the  ability 
to  deal  with  such  "reasonable"  partial  programs  is  one  of 
the  primary  advantages  of  a  programming  system  Based  on 
grammar-driven  synthesis. 

Retaining  this  capability  yields  an  even  more 
interesting  property.  No  problem  develops  if  the  UDSE 
encounters  a  non-terminal  in  the  right-hand  side  of  some 
production  which  is  undefined.  Once  this  non-terminal  is 
copied  into  the  BUFFER  it  can  never  be  replaced?  so  once 
this  action  has  been  taken  a  word  will  never  be  derived. 
However?  the  use  of  an  undefined  non-terminal  can  yield  a 
class  of  sentential  forms.  In  the  context  of  grammars 
defining  programming  languages?  the  described  situation 
might  occur  if  some  subset  of  the  complete  grammar  for  the 
target  language  was  in  use.  The  resulting  form  would  be 
meaningul?  and  lead  to  a  complete  program?  once  the  complete 
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grammar  were  defined* 

Thus #  me  see  that  the  class  of  grammar-driven  syn¬ 
thesizers  to  be  described  have  the  ability  to  deal  intelli¬ 
gent  I  v  not  only  with  oartial  programs*  but  also  with 
part iel 1 y-complete  grammars#  in  a  natural  way. 

Finally#  we  note  that  ambiguous  grammars  present  no 
problem  for  the  60SE.  If  the  incut  grammar  is  ambiguous# 
this  simply  means  that  there  is  more  than  one  way  to  gen¬ 
erate  at  least  one  sentential  form. 

The  question  that  remains  to  be  answered  is  whether 
grammar-dr i ven  synthesizers  can  be  used  to  synthesize  more 
interesting  constructs  than  strings  (for  instance,  some  data 
structure  encoding  the  algorithm  represented  by  the  wora.J. 
In  addition#  it  is  desirable  to  use  a  more  human-oriented 
input  code.  In  the  remainder  of  this  chapter#  first  the 
command#  and  then  the  synthesis  capabilities  will  oe 
improved.  The  resulting  mechanisms  will  inherit  the  basic 
properties  of  the  GDSE#  however#  whfch  remains  our  fundamen¬ 
tal  model  for  grammar-dr i ven  synthesis. 

0.  AN  IMPROVED  GRAMMAR-DRIVEN  STRING  EDITOR 

In  this  section  we  improve  the  Phase  Two  command  mechan¬ 
ism  for  the  GDSE.  The  R-ARGOT  notation  is  our  primary  tool 
for  doing  this.  ,.This  notation  provides  for  a  concise  and 
human-oriented  set  of  rules  as  the  arammar  definition# 
allows  automatic  expansion  of  rule  names  when  there  is  only 
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one  wav  for  expansion  to  be  done>  and  provides  a  framework 
for  selection  of  alternative  expansion  oaths  based  on  xeyinq 
the  desired  alternative  by  means  of  a  mnemonic  Keystroke. 
Yet  the  regularity  of  the  notation  allows  synthesis  to 
proceed  in  a  St rai ght-f orward,  non-recursive  fashion,  pri¬ 
marily  because  the  contents  of  the  rule  can  oe  accessed  oy  a 
finite  automaton.  These  properties  are  not  coincidental, 
since  the  desire  to  achieve  them  provided  the  primary 
motivation  for  restricting  the  ARGOT  notation  in  the  way 
chosen. 

1 .  Rules  and  transformations. 

we  eventually  would  like  to  classify  every  possible 
rule  name  replacement  according  to  some  f i n i t e l y-express i b l e 
scheme.  To  this  end,  we  distinguish  between  the  terms 
"rule"  and  " t rans f ormat i on"  .  For  6NF  notation,  each  produc¬ 
tion  can  result  in  one,  ana  only  one,  transformation  of  a 
non-terminal  symbol  to  a  string  of  symbols.  For  ARGOT  and 
R-ARGOT  notation,  in  contrast,  each  rule  may  express  more 
than  one  such  permissible  transformation.  The  limited  nest* 
i ng  of  R-ARGOT  operators  allows  us  to  list  all  of  the 
t ransf ormat i ons  allowed  for  an  R-ARGOT  grammar  in  a  finite 
list. 

In  order  to  further  reduce  the  set  of  transforma¬ 
tions  possible,  we  introduce  a  special  class  of  symools 
which  are  assumed  to  be  distinct  from  either  rule  names  or 
terminal  strings,  which  we  will  call  "e-symbols".  They  nave 
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the  purpose  of  serving  as  olace  markers  in  a  sentential 
form/  indicating  points  where  optional  strings  formed 
according  to  a  particular  transformation  may  be  insertea. 
Me  will  use  three  classes  of  such  symbols#  with  the  notation 
"o(ru1e  name)"#  "iCrule  name)"#  and  *1 (rule  name)".  Tne 
characters  "o"#  "i"  and  "1"  will  be  used  to  encode  the  exact 
sort  of  t rans f ormat i on  by  which  the  symbol  can  be  replaced# 
and  the  rule  name  argument  will  allow  the  mechanism  to 
access  the  symbols  in  the  grammar  by  which  they  can  oe 
replaced.  Since  their  expansion  is  optional#  for  output 
purposes  we  may  think  of  all  of  these  symbols  as  represent¬ 
ing  the  empty  string.  Mhen  the  buffer  is  to  be  copied  to 
output#  these  symbols  are  simply  skipped. 

Mith  this  notation  in  hand#  we  examine  the  four 
sorts  of  R-ARGOT  rules:  concatenations#  a  I ternat i ons # 

iterations#  and  list  iterations. 

Concatenations  involve  replacement  of  the  rule  name 
by  a  sequence  of  terminal  symbols#  rule  names#  and  optional 
rule  names.  These  elements  must  occur  in  order  exactly  as 
soecified  in  the  rule.  Any  optional  rule  names  are  con¬ 
verted  to  the  e-symbol  "o(rule  name)"  when  they  are  encoun¬ 
tered.  Thus#  the  rule: 

array-type:  l  packed  J  "array"  "  ["  ranges  "J "  "of"  type, 
allows  replacement  of  the  rule  name  <array>  in  the  buffer  by 
o(packed)  array  C  <ranges>  )  of  <type> 

(In  this  section#  we  shall  delimit  rule  names  in  the  Duffer 
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with  angle  brackets  so  tnat  they  cannot  oe  confused  with 
terminal  strings.)  If  the  symbol  "oioacked)"  is  never 
replaced#  this  string  would  o«  copiea  to  the  output  tape 
simply  as 

array  I  <ranges>  1  of  <type> 

«e  see  that  a  concatenation  rule  explicitly  stands  for  a 
single#  invariant  transformation.  Implicit  in  the  existence 
of  an  optional  field#  however#  is  an  additional  transforma¬ 
tion  of  the  form 

o(rule  name)  =>  <rule  name> 

The  use  of  an  e-symbol  has  allowed  us  to  express  wnat  would 
have  been  one  transformation  with  an  indefinite  format#  as 
an  indefinitely  long  (but  finite)  list  of  transformations# 
each  of  fixed  format.  This  notational  trick  will  be  further 
used  in  the  next  chapter  to  make  the  list  of  transformations 
associated  with  a  grammar  even  more  regular. 

Alternation  rules  are  always  of  the  form: 
name:  {  namel  5  name2  •  .  .  .  »  name-n  } 
and  correspond  to  n  t rans f orma t i ons : 

<name>  •>  <namel> 

<name>  =>  <name£> 

•  •  m 

<name>  *>  <name-n> 

Iteration  rules  correspond  to  two  transformations:  that  per¬ 
formed  when  the  rule  name  is  first  replaced#  ana  that 
corresponding  to  additional  iterations.  Thus#  a  rule  of  the 
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form 


name:  ♦  namel 

corresponds  to  the  two  transformations: 

<name>  =  >  <namei>  i(  name  ) 
i(  name  )  =>  <namel>  i(  name  ) 

List  iteration  rules  similarly  consist  of  two 
transformations.  A  rule  of  the  form: 

name:  tt  namel  name2  ... 
corresponds  to  the  transformations: 

<name>  =>  <namel>  1C  name  ) 

1 (name)s>  <name2>  <namel>  1(  name  ) 

2 .  Automatic  synthesis. 

Hawing  listed  all  possible  t ransf ormat i ons ,  we  may 
now  determine  which  of  them  can  be  performed  automatically. 
Given  a  rule  name*  the  type  of  rule  is  effectively  comput¬ 
able  from  the  form  of  the  right-hand  side  of  the  rule  alone. 
If  the  rule  is  an  alternation*  the  user  must  be  consulted  in 
order  to  determine  which  of  the  n  possible  transformations 
is  reauired.  If  the  rule  is  a  concatenation,  there  is  only 
one  Possible  expansion.  If  the  rule  is  a  simple  iteration 
or  list  iteration,  the  initial  transformation  is  required 
and  should  be  automatically  performed.  It  may  be  recalled 
that  predefined  rule  names  (such  as  "identifier")  jre 
allowed  in  an  R-ARGOT  grammar  to  symbolize  calls  to  prede- 
f i ned  input  scanners.  Such  rule  names  do  not  admit  to  expan- 
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sion  by  rule#  but  must  be  expanded  by  referral  to  the  prede 


fined  scanner  which  may  solicit  d  at  i  from  the  user.  rience# 
predefined  rules  cannot  be  automatically  expanded.  The^e  is 
one  other  possibility:  the  rule  name  may  be  undefined.  In 
this  case#  no  expansion  of  any  kind  is  possible. 

Terminal  symbols#  by  definition#  cannot  be  expanded. 
The  e-symools  all  require  user  attention  so  also  cannot  be 
automat i ca 1  I y  expanded. 

As  a  matter  of  terminology#  we  may  classify  symbols 
in  the  buffer  as  bound#  free#  or  transient. 

Sound  symbols  are  those  which  admit  to  no  further 
replacement.  Thus#  in  our  system  undefined  rule  names  and 
terminal  symbols  are  bound. 

Free  symbols  are  those  which  require  a  decision  as 
to  whether  or  not  they  are  to  be  replaced  at  all#  or  by  what 
transformation  they  are  to  be  replaced.  The  free  symools 
are  thus  names  for  alternation  rules  and  predefined  rules# 
as  well  as  the  e-symbols. 

The  remaining  symbols  can  be  transformed  by  one#  and 
only  one#  transformation  which  is  not  optional.  They 
represent  intermediate  steps  of  a  reauired  replacement 
sequence#  may  be  automatically  replaced  without  restricting 
the  range  of  words  which  can  be  formed  from  the  sentential 
form  currently  in  the  buffer#  and  thus  may  be  reqarded  as 
"transient"  in  the  sense  that  they  are  retained  only  until 
they  are  recognized  and  replaced  by  their  equivalent 


automat i ca I  1 y  .  The  transient  symbols  in  the  described  sys¬ 
tem  are  names  of  concatenations!  iterations*  and  list  itera¬ 
tions. 

Since  the  expansion  of  transient  symbols  can  only  oe 
done  in  one  way*  at  the  beginning  of  each  Phase  Two  loop  we 
would  like  to  search  the  buffer  for  a  transient  symool  and 
expand  each  one  found*  continuing  this  process  until  there 
all  symbols  are  either  free  or  bound.  Unfortunately*  for 
unrestricted  R-ARGOT  arammars*  there  is  no  guarantee  that 
this  process  will  terminate.  If  one  can  start  with  a  con¬ 
catenation*  iteration*  or  list  iteration  rule  and  reach  the 
same  rule  by  applying  a  sequence  of  rules  not  including  any 
optional  or  alternation  rule*  the  described  process  may 
never  terminate.  Therefore*  we  must  restrict  the  grammar  so 
that  no  such  cycles  exist. 

Fortunately*  the  existence  or  non-existence  of  such 
cycles  can  be  effectively  computed  given  an  otherwise  syn¬ 
tactically  correct  R-ARGOT  grammar.  This  restriction  is  the 
only  semantic  constraint  we  place  on  R-ARGOF  grammars  for 
the  remainder  of  the  discussion.  The  loss  in  expressive 
power  is  not  great.  Such  cycles  correspond  to  recursive 
expressions  with  no  trivial  case  in  SNF-descri bed  languages* 
and  once  entered*  derive  only  forms  with  non-terminals  and 
never  words. 

with  this  restriction*  which  can  be  enforced  by 
checking  the  input  grammar  during  Phase  One*  we  now  may 

«l 


4 


A 


'  iVv-a:. 


allow  automatic  expansion  of  transient  symbols  during  tne 
oeginning  of  tne  Phase  Two  looo  prior  to  any  furtner  pro* 
cessing  with  the  understand ng  that  such  expansion  is  to  oe 
performed  until  no  transient  symbols  remain.  with  the  gram¬ 
mar  restricted  as  described/  this  process  must  always  ter¬ 
minate.  Since  the  grammar  is  context-free/  the  order  in 
which  transient  symbols  are  expanded  is  of  no  consequence, 
we  will  refer  to  the  automatic  expansion  of  all  transient 
symbols  until  none  remain  as  "autoseanni ng"  . 

The  addition  of  the  autoscanning  feature  relieves 
the  Phase  Two  user  of  the  burden  of  having  to  order  expan¬ 
sions  that  are  required  by  tne  arammar.  The  price  paid  for 
this  facility  is  that  only  those  forms  can  be  produced  wnich 
consist  entirely  of  bound  and  free  symbols.  In  the  context 
of  a  programming  language  defined  by  a  grammar/  the  system 
will  now  synthesize  as  much  of  the  program  as  is  syntacti¬ 
cally  deducible  from  the  part  of  the  program  already  created 
by  the  user. 

As  a  concrete  example/  we  display  the  results  of 
autoscanning  the  target  symbol  for  the  PASCAL  grammar  listed 
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program  <identifier>  (  <identifier>  l<f i lei ist>  )  » 
o ( I abe I s  ) 
o(constants) 
oCtvpes) 
o( var i ab 1 es ) 
o ( subrout i nes ) 
begi  n 

<statement> 

1 (statements ) 

end. 

3.  Improved  Cursor  Control. 

The  next  improvement  to  oe  described  is  a  more  use- 
ful  method  of  cursor  placement. 

From  the  analysis  above,  we  see  that  after  autoscan¬ 
ning  is  performed.  the  buffer  will  contain  only  bound  and 

free  symbols.  By  definition,  the  only  symbols  requiring 
Phase  Two  input  data  for  further  expansion  are  free  symbols, 
since  Pound  symbols  admit  to  no  expansion  at  all.  It  fol¬ 
lows  that  the  cursor  should  always  rest  on  a  free  symool. 
If  there  are  no  free  symbols,  there  are  no  symools  left  to 
expand  in  the  Puffer,  and  the  loop  may  be  left,  the  buffer 
copied  to  the  outout  tape,  and  the  algorithm  terminated.  In 

general,  however,  one  or  more  free  symools  will  oe  left  in 

the  buffer  at  the  end  of  autoscan.  we  wish  to  allow  the 

user  a  means  to  move  the  cursor  between  them,  and  must  also 

decide  what  to  do  after  the  symbol  indicated  by  the  cursor 
has  been  expanded.  It  should  be  clear  that  cursor  movement 

never  has  any  effect  on  either  the  contents  of  trie  buffer 

nor  on  the  valid  derivations  reachable  at  any  point  in  the 
synthesis.  The  first  is  true  simply  because  cursor  movement 
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leaves  the  buffer  unchanged/  and  the  second  oecause  of  the 
context-free  nature  of  the  expansion  operation. 

Accordingly/  after  autoscanning,  if  there  are  any 
free  symbols  left,  we  allow  the  user  to  move  the  cursor  oack 
and  forth  by  entering  zero  or  more  cursor  control  symools 
(represented  by  "->"  for  movement  riqht  and  by  "<-"  for 
movement  left). 

The  only  question  remaining  is  how  to  position  the 
cursor  initially/  and  how  to  reposition  it  after  a  symool  is 
expanded.  Me  assume  that  after  a  symbol  is  expanded,  the 
Duffer  is  aut o scanned  aqain  to  remove  any  new  transient  sym¬ 
bols.  If  the  section  of  the  buffer  replacing  the  expanded 
svmbol  now  contains  one  or  more  free  symbols,  the  cursor  is 
placed  at  the  leftmost  such  symbol.  Otherwise,  it  is  placed 
at  the  first  free  symbol  in  the  remaining  string  of  symbols. 
If  there  are  none,  wraparound  takes  place  and  the  cursor  is 
placed  at  the  first  free  symbol  in  the  old  substring  to  the 
left.  Initially,  the  cursor  is  placed  at  the  first  free 
symbol  in  the  buffer. 

4 .  Trans f ormat i on  Selection. 

Finally,  we  address  the  problem  of  causing  an 
optional  t ransf ormat i on  to  be  applied,  once  the  cursor  has 
been  positioned  as  desired  by  the  user. 

From  the  discussions  above,  the  cursor  must  be  rest¬ 
ing  on  a  free  symbol,  that  is,  at  either  a  oredefined  rule 
name  or  the  rule  name  for  an  alternation,  or  at  an  e-symbol 


of  type  O/  i  or  1.  To  simplify  the  command  languaqe  model/ 
the  entry  of  a  blank  is  adopted  as  the  uniform  moans  of 
indicating  that  an  expansion  is  to  take  place  at  the  current 
cursor  position.  If  the  cursor  is  at  a  predefined  rule 
name/  control  is  then  turned  over  to  the  indicated  prede¬ 
fined  input  scanner.  If  it  is  at  an  e-symbol ,  the  appropri¬ 
ate  transformation  is  made/  the  result  autoscanned/  and  the 
cursor  repositioned  for  another  loop  through  the  c/cle. 
Finally/  if  the  cursor  is  at  the  rule  name  for  an  alterna¬ 
tion/  one  of  many  potential  transformations  must  oe 
selected.  Another  symbol  is  entered  and  this  is  matched  to 
keystrokes  included  in  the  rule  body. 

Thus/  we  must  extend  the  R-ARGOT  notation  to  alio* 
inclusion  of  the  keystroke  for  each  alternative  which  will 
trigger  it.  An  alternation  now  looks  like: 
statement:  {  'a'  assignment 

!  ' i *  i f-statement 
!  'w'  while-statement 
1  'c'  case-statement 
> . 

The  symbol  'a*  will  invoke  the  transformation 
<statement>  =>  <assignment> 
the  symbol  'w*  the  transformat  ion 

<statement>  =  >  <whi l e-statement> 


«5 


and  so  on 


Extensions  to  this  simple  system  are  easy  to  imple¬ 
ment  and  desirable.  In  particular,  a  string  of  more  than 
one  character  could  be  allowed  as  key.  Some  work  has  oeen 
done  in  allowing  a  " fa  1 1 -t h rough "  key,  symbolized  oy  "  '  '  H  , 
which  invokes  the  indicated  transition  upon  any  symbol  which 
does  not  occur  anywnere  else  in  the  list  of  alternative 
keys,  and  reapplies  the  enterea  symbol  to  the  next  alterna¬ 
tive  generated.  Such  enhancements  are  not  considered 
further  in  the  present  work. 

Thus,  the  only  data  which  must  be  entered  during 
Phase  Two  are  cursor  control  commands,  which  leave  the  syn¬ 
thesized  string  intact  but  move  the  cursor,  ana  invocations 
of  transformations,  which  consist  of  a  single  blank,  fol¬ 
lowed  oy  nothing  for  e-symbol  expansions  (lists,  iterations, 
or  optional  field  inclusion),  by  a  context-  depenaent  keys¬ 
troke  for  alternative  selection,  and  by  whatever  is  needed 
by  the  appropriate  input  scanner  for  such  items  as  identif¬ 
iers,  numbers,  and  the  like. 

5.  Discussion. 

ae  have  now  enhanced  the  capabilities  of  the  GDSE  on 
the  input  side  to  allow  string  synthesis  driven  by  a  numan- 
oriented  grammar,  with  a  reasonably  supple  means  of  cursor 
control  and  t ransf ormat i on  selection.  The  resulting  mechan¬ 
ism  still  has  the  desirable  properties  of  the  GDSE:  it  can 
accept  virtually  any  context-free  grammar  (we  have  lost 
those  which  contain  irreducible  recursions)  and  generate  any 


form  derivable  under  that  grammar  (some  of  whicn  are 
automatically  expanded).  It  is  also  still  true  that  the 
buffer  never  contains  an  incorrect  sentential  form. 

The  mechanism  that  has  been  descrioea  in  this  sec* 
tion  is  considerably  simpler  than  that  for  a  parser  genera¬ 
tor.  This  simplicity  is  the  result  of  allowing  interaction 
between  the  user  and  the  synthesizer  during  the  stage  when 
the  grammar  of  the  language  is  available  to  the  mechanism. 
User-provided  data  is  available  to  guide  a  true  top-down 
synthesis  of  the  desired  word  in  the  defined  language. 

The  described  system  is  highly  useful  in  its  own 
right.  It  could  be  used,  for  instance,  to  prepare  programs 
for  entry  into  a  conventional  system  with  the  guarantee  that 
the  program  was  syntactically  correct.  The  compiler  used 
would  not  need  the  ability  to  handle  syntactic  errors  (a 
notably  difficult  design  problem).  In  addition,  since  the 
input  grammar  is  interpreted,  the  same  editor  could  be  used 
for  many  different  languages. 

rte  want  to  do  more,  however.  In  the  next  section, 
we  investigate  one  way  to  synthesize  more  complicated  data 
structures  using  the  grammar-driven  editor  we  have  described 
in  this  sect i on . 

E.  TREE  SYNTHESIS 

So  far.  all  of  the  mechanisms  described  synthesize 
strings.  In  order  to  subsume  the  ideas  already  developed 
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under  the  genera)  notion  of  tree  synthesis/  we  first  criarac 


t  er i ze  strings  as  a  special  sort  of  tree.  We  then  discuss 
the  notion  of  parse  trees/  and  generalize  it  to  fom  tne 
more  general  class  of  derivation  trees/  of  wnich  both  string 
trees  and  parse  trees  are  a  special  case.  Since  trees  are  a 
we  1 1 -understood  data  structure/  we  shall  not  define  them 
formally  but  treat  their  general  properties  in  an  intuitive 
fashion.  For  the  remainder  of  this  section  we  shall  assume 
that  the  algorithms  necessary  to  create  ana  manipulate  gen¬ 
eralized  (mul t i -ch i I dren ) /  ordered  trees  are  freely  avail¬ 
able.  Such  trees  consist  of  a  finite  number  of  nodes/  each 
of  which  has  a  finite  number  of  children  occuring  in  an 
ordered  sequence. 

In  addition  to  having  children/  we  assume  that  eacn  node 
may  also  contain  an  indefinite  amount  of  symbolic  informa¬ 
tion.  In  particular/  with  each  node  may  be  associated  a 
string  called  its  label. 

Those  nodes  of  a  tree  with  no  children  are  its  leaf 
nodes.  Since  the  tree  is  ordered/  its  leaf  nooes  may  also 
oe  ordered  into  a  linear  list.  ne  assume  that  all  of  the 
nodes  of  a  synthesized  tree  may  be  examined  and  accessed  for 
the  information  they  mav  contain. 

1 .  Re-Interpretat i on  of  the  GPSE. 

In  all  of  the  work  that  follows/  we  use  a  syn¬ 
thesizer  that  is  formally  identical  to  the  GDSE.  *e  shall 
call  such  a  mechanism  a  GDEz  for  Grammar-Dr i ven  tditor.  Tne 


action  taken  by  those  steps  in  the  algorithm  that  actually 
interact  witn  the  SUFFER  are  re-interpretea  as  calls  to 
tree-manipulation  subroutines.  The  BUFFER  is  now  conceived 
to  contain#  not  strinqs  of  symbols#  but  appropriately  imple¬ 
mented  ordered  trees  with  labeled  nodes.  Rather  than 
describing  the  algorithms  involved  to  create#  modify#  and 
traverse  such  structures  in  detail#  we  assume  that  mathemat¬ 
ically  correct  subroutines  are  available  to  perform  tne 
needed  functions#  since  methods  for  implementing  trees  using 
a  sequentially-addressed#  rewritable  memory  store  are  well- 
known  . 

In  order  to  re-interpret  the  improved  UDSE  as  a  tree 
synthesizer  in  this  way#  we  need  routines  to  initialize  the 
BUFFER  with  a  target  tree  Cor  initial  tree)#  move  the  cursor 
back  and  forth#  and  replace  a  "symbol"  with  a  "string  of 
symbols"  (whatever  these  terms  mean  in  the  new  context). 
Also#  we  now  need  to  explicitly  identify  the  precise  means 
used  to  "display"  a  tree. 

Supposing  that  appropriate  routines  are  availaole# 
we  wish  to  argue  that  the  new  mechanism#  which  synthesizes 
trees#  instead  of  strings#  inherits  all  of  the  formal  pro¬ 
perties  of  the  original#  in  the  following  sense. 

The  display  algorithm  in  use  may  be  thought  of  as  a 
function#  d#  mapping  trees  into  strings.  Me  shall  consider 
a  tree  to  be  a  "sentential  form"  of  the  input  grammar  of 
interest  if#  and  only  if#  its  image  is  a  string  *hicn  is  a 
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sentential  form  of  the  qrammar. 

he  wish  to  compare  tne  operation  of  the  old  ana  the 
new  mechanisms#  given  exactly  the  same  stream  of  input  sym¬ 
bols  on  the  PHASE2  INPUT  tape#  supposing  that  the  grammar 
specifications  on  the  PHASE1  INPUT  tape  are  equivalent  in 
some  as  yet  unspecified  sense.  The  fundamental  property 
that  gives  the  GDSE  all  of  the  features  that  maxe  it  an 
appropriate  synthesizer  for  sentential  forms  is  that  at  each 
entry  to  the  loop#  the  BUFFER  always  contains  a  correct 
form.  This  property  is  a  consequence  of  the  fact  that  the 
manipulations  inside  the  loop  either  leave  the  contents  of 
the  buffer  unchanged#  or  transform  one  valid  form  to 
another.  Since  the  BUFFER  is  initialized  with  a  valid  form, 
by  induction  the  BUFFER  never  contains  anything  but  a  valid 
form  upon  loop  entry. 

he  would  like  the  new  mechanism  to  perform  the  same 
derivation  steps#  given  the  same  PHASER  input  sequence#  as 
the  old.  The  display  function  would  then  serve  as  a  mor¬ 
phism  from  the  new  mechanism  to  the  old#  over  the  operations 
defined  by  the  possible  BUFFER  transactions  made  available 
by  the  algorithm  within  its  basic  loop.  Thus#  if  it  is  true 
that#  for  any  given  cycle  through  the  loop  by  the  parallel 
mechanisms#  with  identical  forms  in  the  two  BUFFERS  at  tne 
beginning  of  the  loop  (as  viewed  under  the  display  function 
for  the  new  mechanism)#  and  that  corresoondi nq  derivations 
are  undertaken  within  the  loop#  then  for  every  possible 
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derivation  sequence  that  can  occur  under  tne  old  mecnanism 
there  will  be  one/  and  only  one»  derivation  sequence  wnich 
occurs  under  the  new  mechanism/  ana  the  product  of  the  new 
mechanism/  when  viewed  under  the  display  function/  will  oe 
identical  to  that  of  the  old. 

The  question  of  paramount  interest/  is  under  what 
circumstances  will  this  property/  that  the  contents  of  ooth 
BUFFERS  will  be  display-equivalent  for  any  step  in 
equivalent  machines/  be  true? 

It  is  well  outside  of  the  3cope  of  our  research  to 
provide  a  complete  answer  to  this  question/  in  the  form  of  a 
set  of  necessary  and  sufficient  constraints  so  that  the 
desired  property  (which  we  might  call  "stepwise 
equivalence")  is  true.  Rather/  we  shall  provide  a  descrip¬ 
tion  in  general  terms  of  a  natural  class  of  re¬ 
interpretation  constraints  that  are  merely  sufficient. 

In  the  improved  GDSE/  the  PHASE1  INPUT  tape  con¬ 
tained  a  finite  set  of  rules/  each  of  which  consisted  of  a 
finite  set  of  transformations  with  one  symrol  on  the  left- 
hand  side/  and  a  string  of  symbols  on  the  right-hand  side. 
In  the  re-i nterpreted  synthesizer/  each  transformation  will 
consist  of  a  specification  calling  for  the  replacement  of  a 
single  leaf  node/  labelled  with  the  symbol  on  the  left-hand 
side  of  the  original  t rans f ormat i on /  with  a  forest  of  adja¬ 
cent  siblings  with  leaf  nodes  labelled  with  each  of  the  sym¬ 
bols  on  the  right-hand  side.  Such  a  tree  t ransf ormat i on 
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soec i f i cat i on  will  be  referred  to  as  a  template 


R epl ace 


ment  of  a  symbol  by  a  string"  is  now  taken  to  mean  the 
replacement  of  a  labelled  leaf  node  by  the  forest  of  adia- 
cent  siblinqs  soeeified  dv  the  aporopriate  template. 

In  order  to  ensure  that  the  structure  in  the  BUFFER 
is  always  a  tree#  (since  we  may  allow  replacement  of  a  node 
by  a  forest)/  it  is  necessary  to  ensure  that  the  root  node 
in  the  BUFFER  is  never  broken  uo  into  a  forest.  He  there¬ 
fore  impose  .he  constraint  on  the  system  that  the  BUFFER  be 
initialized  with  a  tree  consisting  of  a  special  root  node 
with  one  child/  labeled  with  the  target  symbol.  Since  only 
leaf  nodes  are  ever  replaced/  no  replacement  ever  turns  a 
previously  internal  node  into  a  leaf  node  (no  transforma¬ 
tions  have  empty  right-hand  sides).  Since  the  root  node  is 
initially  internal/  it  is  never  replaced.  Hence  tne  struc¬ 
ture  in  the  BUFFER  is  always  a  bona  fide  tree. 

The  above  suppositions  are  insufficient  to  obtain 
the  steowise  equivalence  property  by  themselves,  since  we 
have  not  addressed  the  display  function,  which  is  used  to 
define  what  is  meant  by  a  tree  which  is  a  valid  sentential 
f  orm  . 

In  the  final  system  to  be  described/  the  language 
implementer  will  be  given  the  power  both  to  select  a  partic¬ 
ular  template  from  all  of  the  valid  candidate  templates 
available,  cor respond! ng  to  the  given  transformation,  and 
also  influence  the  display  order  of  the  children  of  a  given 
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node.  The  retention  of  stepwise  equivalence  depends  jointly 
on  the  consistent  application  of  this  facility.  and  it  is 
our  present  intention  to  provide  a  sufficient  condition 
which  does,  in  fact,  preserve  it. 

Selection  of  a  single  template  for  each  transforma- 
tion  in  the  original  grammar  mav  be  thought  of  as  specifying 
a  function,  mapping  transformations  into  templates.  Let  us 
name  this  function  f. 

In  the  work  immediately  following,  the  display  algo¬ 
rithm  will  be  very  simple.  A  tree  is  displayed  oy  listing 
the  labels  for  all  of  its  leaf  nodes  in  order.  Since  the 
right-hand  side  of  templates  are  ordered  forests,  we  may 
also  speak  consistently  of  applying  d  to  the  template: 
again.  we  simply  list  all  of  the  leaf  node  labels  in  order. 
The  required  constraint  is  simply  this:  f  and  d  must  be 
inverse  functions  on  the  set  of  transformations  in  the  gram¬ 
mar  and  selected  templates.  That  is.  each  template  must 
display  as  the  t rans f orma> i on  to  which  it  corresponds. 
Finally,  movement  of  the  cursor  back  and  forth  is  to  be 
interpreted  as  movement  of  the  cursor  from  leaf  node  to  leaf 
node,  as  ordered  under  the  display  function. 

Under  these  conditions,  stepwise  equivalence  will  be 
retained  by  the  new  mechanism.  The  fundamental  reason  for 
this  is  that  the  display  algorithm  defined  is.  itself, 
"context-free".  If  a  given  tree  is  a  sentential  form, 
application  of  a  template  to  it  will  yield  a  tree  which  is 
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also  a  sentential  form 


Moreover*  the  new  tree  will  display 


as  the  same  form  as  that  yielded  by  the  corresponding  symbol 
replacement  applied  by  trie  string  synthesizer.  Cursor  move- 
ment  also  takes  place  in  oarallel. 

Since  the  new  mechanism  is  stepwise  equivalent  to 
the  old*  it  inherits  all  of  the  formal  properties  of  tne 
old.  Of  course*  since  the  actual  contents  of  the  BUFFER  may 
be  suost ant i a  1 1 y  richer  in  structure  at  any  given  time*  the 
new  mechanism  may  have  emergent  properties  of  its  own  in 
addition  to  those  inherited  from  the  GDSE*  but  such  proper¬ 
ties  can  be  utilized  only  by  using  an  additional  algorithm 
to  access  information  that  has  been  hidden  in  internal  nodes 
of  the  tree  in  the  BUFFER. 

A  more  flexible  display  algorithm  will  be  used  in 
the  final  system.  The  implementer  will  have  the  power  to 
permute  the  display  order  of  the  nodes  in  a  template*  as 
well  as  to  display  strings  stored  with  the  rule  instead  of 
as  labels  of  a  node.  The  display  algorithm  retains  the 
basic  property  of  providing  a  context-free  display*  however* 
and  the  same  constraint  applies  to  the  display  and  template 
specifications  chosen:  each  template  must*  in  fact*  display 
as  its  corresponding  transformation  in  order  for  the  system 
to  maintain  stepwise  equivalence. 
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2.  Strings  as  Trees 


we  may  think  of  a  string  as  a  soecial  sort  of  tree 
which  has  a  root  node  ana  one  child  for  each  symDol  in  the 
string.  Such  a  two-level  tree  we  shall  call  a  string  tree. 
For  instance,  the  string 

"if  <expression>  then  <statement>  o ( e 1 se-oar t )  " 
corresponds  to  tne  string  tree 

<root  > 

if  <expression>  then  <st at ement >  o(else-part) 

In  order  to  synthesize  string  trees  with  a  GDE,  we 
initialize  the  BUFFER  with  the  tree 

<root  > 

<target> 

Replacement  of  a  symool  by  a  string  of  sympols  is 
redefined  as  the  replacement  of  a  leaf  node  by  a  set  of 
adjacent  sioling  nodes,  fitted  into  the  place  of  tne 
replaced  node  in  the  ordered  list  of  leaf  nodes.  In  other 
words,  the  template  corresponding  to  a  given  transformation 
is  just  an  ordered  forest  of  single-node  trees. 

The  resulting  GDE,  although  it  does  synthesize 
trees,  constitutes  a  system  that  is  isomorphic  to  the  GuSE. 

3.  Parse  Trees. 

The  concept  of  a  Parse  tree  occurs  frequently  in  the 
theory  of  context-free  grammars. 
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we  can  view  parse  trees  as  the  structures  syn¬ 
thesized  by  another  r e- i n t e ro r e t a t i on  of  the  basic  grammar- 
driven  synthesizer.  The  initial  tree  is  taken  to  be  tne 
same#  two  node  tree  as  for  the  case  of  string  trees.  Tne 
notion  of  replacement  of  a  symbol  by  a  string  is  re¬ 
interpreted  as  the  addition  of  children  to  a  leaf  node, 
labeled  with  all  the  symbols  of  the  string.  In  other  words, 
templates  always  take  the  form  of  a  tree*  with  the  root  node 
labeled  with  the  left-hand  side  of  the  transformation,  and 
each  child  labeled  with  the  appropriate  symbol  from  the 
right-hand  side.  As  usual*  the  "string**  in  the  BUFFER  is 
the  ordered  list  of  leaf  nodes.  The  resulting  structure  is 
considerably  richer  than  that  retained  in  the  BUFFER  by  the 
GQSE*  since  once  a  node  is  created*  it  is  never  removed. 
(More  accurately*  if  it  is  removed  while  a  leaf  node*  it  is 
immediately  replaced  by  a  copy  of  itself.). 

4 .  Comparison  of  String  Trees  and  Parse  Trees. 

we  take  the  view  that  string  trees  and  parse  trees 
are  two  special  cases  of  a  whole  range  of  trees  that  can 
represent  a  particular  sentential  form.  This  observation 
can  be  justified  by  comparing  the  properties  of  the  two 
types  of  trees.  A  string  tree  incorporates  the  minimum 
amount  of  historical  information  concerning  the  derivation 
sequence  by  which  it  was  produced:  just  enough  for  further 
derivation  to  correctly  proceed.  As  a  result*  string  trees 
are  very  compact. 
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Parse  trees*  on  the  other  hand*  ineoroorate  a  very 
large  amount  of  information  concerning  the  derivation 
sequence  by  which  they  were  produced:  enough  so  that  the 
entire  sequence  can  be  reconst  rue t ed  (down  to  the  permuta¬ 
tion  of  commutative  non-terminal  selection).  As  a  result* 
parse  trees  are  very  large.  As  a  concrete  example*  Figure  1 
in  Appendix  n  contains  both  the  parse  tree  for  a  trivial 
PASCAL  program. 

Our  eventual  goal  is  to  provide  for  grammar-driven 
synthesis  of  directly  evaluable  trees  of  reasonable  size.  A 
secondary  goal  is  to  do  this  in  such  a  way  that  the  result¬ 
ing  tree  can  be  displayed  as  a  program  in  the  language  in 
which  it  was  created*  but  can  be  evaluated  without  any  addi¬ 
tional  syntactical  access. 

Neither  strinq  trees  nor  parse  trees  are  suitable 
const  rue t  s  for  achieving  these  goals.  String  trees  incor¬ 
porated  no  structural  information  and  must  oe  reparsed  in 
order  to  access  their  semantic  contents  in  the  correct 
order.  (This  process  may  even  he  impossible  if  the  string 
tree  was  synthesized  under  an  ambiguous  grammar.)  Too  much 
information  has  been  discarded  at  the  time  of  synthesis. 

On  the  other  hand*  parse  trees  are  unreasonably 
large.  Most  of  the  nodes  record  syntactical  information 
that  is  semantically  content -free . 
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Our  task,  therefore,  is  to  find  a  way  to  reach  some 
middle  ground,  synthesizing  trees  which  contain  enough  nodes 
to  retain  the  desired  control  structure,  but  allowing  the 
elimination  of  nodes  which  have  no  semantic  content. 

The  purpose  of  the  present  section  is  not  to  provide 
a  complete  description  of  how  this  is  to  be  aone,  but  to 
provide  a  conceptual  range  of  intermediate  possibilities. 
It  will  then  be  possible  to  choose  the  sort  of  tree  to  be 
synthesized  to  meet  a  particular  requirement  intelligently. 
In  short,  we  wish  to  introduce  some  "engineering  slack"  into 
the  formal  system. 

This  purpose  is  realized  by  introducing  the  notion 
of  derivation  trees;  a  general  concept  of  which  both  parse 
and  string  trees  are  a  special  case. 

5 .  Derivation  Trees. 

One  way  to  characterize  the  structure  of  a  parse 
tree  is  to  note  that  every  parent  node  in  the  tree  derives 
its  children  in  exactly  one  step.  Thus,  the  relation 
between  parents  and  children  in  the  tree  is  the  same  as  the 
"=>"  rel at i onsh i p. 

«e  consider  the  set  of  trees  in  which  each  parent 
derives  its  children  in  zero  or  more  steos;  that  is,  incor¬ 
porates  the  "*s>"  relationship. 

Such  trees  may  be  constructed  from  a  parse  tree  in 
the  following  manner: 

a.  Mark  the  root  and  leaf  nodes. 
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Mark  zero  or  more  of  tne  remaining  nodes 


c.  Discard  each  unmarked  node.  Every  time  a 
node  is  discarded.  replace  »t  within  the 
set  of  its  siblinqs  by  all  of  its  children, 
taken  now  as  adjacent  siblings.  (This 
procedure  preserves  the  relative  ancestry 
of  all  undiscarded  nodes.). 

The  above  procedure  assures  that  every  remaining 
node  derives  its  new  children  in  zero  or  more  steps.  This 
can  be  seen  oy  noting  that  the  hypothesis  is  true  for  the 
original  parse  tree,  and  that  if  true  for  a  discarded  nooe 
and  its  children,  is  true  for  the  node's  parents  and  its 
children  during  each  application  of  the  third  step.  Hence, 
it  is  true  for  the  resulting  tree. 

In  the  procedure  lust  specified.  the  selection  of 
interior  nodes  to  be  retained  is  done  non-determi ni st i ca 1 l y  . 
It  is  the  specification  of  the  particular  agorithm  to  oe 
used  for  selecting  nodes  for  retention  that  we  make  avail¬ 
able  to  the  system  implementer  as  an  engineering  choice. 
The  two  simplest  algorithms  are  to  retain  all  interior 
nodes,  in  which  case  parse  trees  are  produced,  or  to  discard 
all  interior  nodes,  in  which  case  string  trees  are  produced. 

The  trees  produced  by  the  procedure  just  descri bed 
we  call  generalized  derivation  trees.  Our  goal,  however,  is 
not  to  produce  a  full  parse  tree  and  only  then  to  prune  it. 
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But  to  synthesize  a  pruned  derivation  tree  directly  as  we  go 
a  1  ong. 

This  desire  suggests  that  we  apply  a  particular  syn¬ 
thesis  uniformally*  in  the  sense  that  for  each  transforma¬ 
tion  implicit  in  the  R-ARGOT  grammar  there  De  associated 
one»  and  only  one*  synthesis  action.  This  suggestion  is  not 
quite  a  necessary  implication!  one  could  conceive  of  some 
history  or  context-dependent  algorithm  for  selecting  one  of 
several  predefined  synthesis  actions  associated  with  a 
transformation.  In  fact*  such  "intelligent"  systems  are  an 
interesting  subject  for  future  research. 

But  if  the  simpler  protocol  is  adopted*  we  ootain  a 
sub-class  of  derivation  trees,  which  we  call  derivation 
trees  constructed  by  rule.  Both  parse  trees  and  string 
trees  are  also  members  of  this  class.  Hereafter*  the  term 
"derivation  tree"  will  oe  understood  in  this  restricted 
sense . 

The  association  of  one*  and  only  one  template,  with 
each  transformation  is  very  clearly  an  embodiment  of  this 
idea.  The  GOE  previously  described  is  thus  a  mechanism 
capable  of  synthesizing  any  class  of  uniform  derivation 
trees  desired  for  a  given  grammar  in  R-ARGUT. 

In  essence*  the  next  chapter  represents  the  selec¬ 
tion  of  further  constraints  on  the  template  formats  to  be 
associated  with  each  type  of  transformation*  in  such  a  way 
that  our  design  goals  are  acheiveo.  The  trees  produced 
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under  the  set  of  orotocols  are  a  particular  sort  of  deriva¬ 
tion  tree  constructed  oy  rule#  which  we  shall  call  hereafter 
abstract  syntax  trees.  This  name  is  adopted  from  the  ideas 
contained  in  [Mcheenan  19701  as  representing  an  intermediate 
stage  in  the  translation  of  some  program  in  which  a  parse 
tree  has  had  its  syntax-dependent#  semantically  void  inte¬ 
rior  nodes  pruned  away. 

o .  Elimination  of  Terminal  Strings  in  Derivation  Trees. 
An  inspection  of  parse  trees  such  as  the  one 
displayed  in  figure  1  suggests  three  general  classes  of 
nodes  for  elimination:  those  representing  a  series  of  pro¬ 
duction  steps  needed  to  fill  a  high-level  slot  with  a  low- 
level  construct  (so-called  "empty  product i ons" ) ?  those 
encoding  options  available  but  not  so  far  taken  (e-symools); 
and  those  representing  keywords  and  punctuation. 

As  the  next  chapter  shows#  selection  of  appropriate 
template  protocols  allows  removal  of  nodes  representing 
empty  productions.  It  is  our  belief  that  nodes  of  the 
second  type  can  also  be  eliminated  by  appropriate  template 
selection  and  context-sensitive  computation  to  compute  the 
existence  of  a  "virtual"  option. 

ke  now  investigate  a  methodology  for  eliminating 
most  nodes  required  to  hold  terminal  strings. 

ke  first  make  the  observation  that  most  such  nodes 
are  semantically  cont ent -f ree .  An  examination  of  the  k- 
ARGOT  notation  will  show  that  terminal  symbols  can  only  be 
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added  to  a  synthesis  in  one  of  two  ways:  by  means  of  a  con¬ 
catenation  or  list-iteration  transformation/  or  by  means  of 
a  ©redefined  (autoparsed)  rule  name  expansion.  in  tne 
second  case/  the  included  string  may  well  be  meaningful/ 
e.g.  if  it  is  an  identifier  or  the  lixe.  In  the  former 
case/  however/  since  the  required  terminal  string  cannot  be 
an  optional  field/  there  is  no  choice  as  to  whether  tne 
string  can  or  cannot  be  included.  If  such  a  choice  existed/ 
it  must  have  been  via  an  earlier  option  or  alternative 
selection/  and  by  the  template  protocols  specified  in  tne 
next  chapter/  this  selection  is  already  encoded  into  the 
structure  of  the  tree.  There  is  thus  no  reason  to  ada  a 
node  to  the  tree  simply  to  represent  an  invariant  field. 

On  the  other  hand/  in  order  to  be  usable  we  must  be 
able  to  display  the  string  as  if  it  were  a  node  in  the  tree. 
The  solution  to  this  quandary  is  to  matte  provision  for  com¬ 
puting  the  location  and  contents  of  such  virtual  fields  when 
the  need  arises.  This  can  be  done/  provided  that  list  and 
concatenation  rule  templates  always  have  a  single  head  node 
which  can  be  associated  with  the  specific  rule  fro*  which 
they  were  derived  in  some  wav  (either  by  inserting  a  refer¬ 
ence  to  the  rule  into  the  node#  or  computing  the  rule  from 
context).  If  the  contents  of  the  virtual  fields  associated 
with  the  rule  are  then  stored  with  the  rule/  we  can  avoid 
repeating  these  strings  throughout  the  derivation  tree. 


These  ideas  are  more  concretely  discussed  in  the 
protocols  for  template  construction  in  the  next  chapter. 

F.  COMPARISON  OF  GRAMMAR-UTILIZATION  TECHNOLOGIES 

It  is  approor i ate  at  this  point  to  step  back  and  place 
the  system  of  grammar  utilization  described  in  this  chapter 
within  the  range  of  currently  available  technologies  for 
grammar  utilization.  Me  shall  compare  this  system  with  the 
two  common  parsinq  techniques:  bottom-up  and  top-down  pars¬ 
ing.  All  three  of  these  techniques  may  be  thougnt  of  as 
producing  as  output  derivation  trees. 

It  should  be  recognized  that  the  tree  produced  by  a 
parser  in  contemporary  translation  systems  is  usually  "vir¬ 
tual".  The  parser  emits  a  series  of  syntax-directed  action 
commands  which  may  be  thought  of  as  the  sequential  represen¬ 
tation  of  a  post-order  traversal  of  a  derivation  tree.  The 
"back  end"  of  the  system  may  be  thought  of  as  traversing 
behind  the  parser*  destroying  nodes  as  quickly  as  tney  are 
built. 

Both  of  the  parsing  techniques  are  designed  to  proceed 
automatically*  that  is*  without  any  human  intervention.  The 
grammar-dr i ven  synthesizer,  in  comparison,  is  inherently 
interactive.  This  property  is  both  an  advantage  and  a 
disadvantage*  in  that  the  synthesizer  utilizes  interaction 
to  attain  desirable  goals*  but  cannot  be  implemented  without 
interactive  devices  being  available. 
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The  need  fo r  the  oarser-or i ented  techniques  to  oroceed 
automatically  olaces  a  set  of  mathematical  constraints  on 
the  grammars  usaole  by  such  systems.  The  grammar "dr i ven 
synthesizer  is  capable  of  utilizing  almost  any  context-free 
grammar*  a  capability  that  allows  the  language  designer  to 
optimize  the  grammar  selected  for  realizing  some  programming 
language  towards  a  set  of  semantically  natural  rules  wnich 
will  be  easy  for  tne  human  user  to  understand. 

The  parser-based  systems  are  essentially  decoders, 
translating  a  valid  word  in  the  defined  language  into  a  more 
complicated,  but  equivalent,  structure.  inherent  in  this 
process  is  the  requirement  for  the  user  to  use  some  other 
system,  such  as  a  Keypunch  or  text  editor.  to  formulate  a 
valid  input  word  in  sequential  form;  a  notoriously  error- 
prone  and  tedious  process.  in  contrast,  the  grammar-driven 
synthesizer  allows  the  user  to  create  the  desired  tree 
structure  directly  and  with  no  possibility  of  syntactic 
error  (since  such  errors  are  simply  rejected  immediately!. 

Finally,  we  note  that  both  parsing  techniques  synthesize 
the  output  tree  from  the  bottom  up.  The  grammar-driven  syn¬ 
thesizer  follows  a  true  top-down  synthesis;  thus.  the 
part i a  I  1 y-comp l ete  structure  is  completely  we  I  1 -st rue tured 
so  far  as  it  goes.  The  system  is  for  this  reason  well- 
suited  as  a  base  for  dealing  with  partially  complete  pro¬ 
grams. 
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III.  CONCEPTUAL  DESIGN  FOR  GDE 


A.  INTRODUCTION 

In  this  chaoter  a  conceptual  design  for  a  Grammar 
Directed  Editor  is  develooed  within  the  framework  defined  in 
Chaoter  II. 

The  mathematical  model  oroviaes  a  large  framework  in 
which  to  design  a  Grammar  Directed  Editor#  subject  to  the 
following  res t r i c t i ons J 

1.  Grammar  roles  are  limited  to  the  concatenation# 
alternation#  iteration#  list#  predefined#  and  undefined 
roles  in  the  forms  specified  by  the  R-ARGOT  notation. 

2.  The  temolates  associated  with  these  grammar  rules 
may  consist  of  arbitrary  forests  of  siblings#  the  leaves  of 
which  must  be  labelled  in  accordance  with  the  transforma¬ 
tions  summarized  in  Figure  2. 

3.  The  temolates  for  list  and  concatenation  rules  which 
include  terminal  symbols  must  create  head  nodes  which  retain 
or  refer  to  those  terminal  symbols  for  display. 

A  Grammar  Directed  Editor  constructed  in  accordance 
with  these  restrictions  will  produce  a  derivation  tree  whose 
leaves  and  terminal  symbols#  retained  in  head  nodes#  are 
disolayable  as  a  valid  derivation  of  the  inout  arammar. 

The  following  design  restrictions  and  goals  serve  as  a 
basis  for  limitinq  the  very  general  nature  of  the  possible 
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templates  to  a  set  of  generic  templates  which  define  the 


permissible  transformations  available  for  the  construction 
of  an  Abstract  Syntax  Tree  (AST): 

1.  The  AST  should  contain  the  minimum  number  of  nodes 
consistent  with  the  retention  of  all  necessary  semantic  and 
schematic  information. 

2.  The  structure  of  the  AST  should  admit  efficient 
editing  algorithms,  in  particular  for  append,  delete,  and 
insert  functions. 

3.  The  AST  should  not  only  be  an  evaluable  structure, 
but  further  it  should  require  no  "preprocessing"  between 
editing  and  evaluation  operations. 

4.  The  generic  transformation  template  structure  should 
be  such  that  the  creation  of  specific  templates  for  a  qiven 
grammar  can  be  automated  over  the  simplest  possible  inout 
data,  perhaps  as  simple  as  a  grammar  in  a  suitable  notation. 

The  methodoloqy  employed  in  the  design  process  described 
in  the  following  section  is  to  apply,  working  within  the 
constraints  which  the  mathematical  moael  suggests,  such 
further  constraints  and  definitions  as  may  be  necessary  to 
develop  generic  templates  for  each  t rans f ormat i on  which 
realize  the  design  goals.  In  section  C,  a  method  for 
displaying  the  AST  is  developed  which  is  consistent  with  the 
generic  templates  as  well  as  with  the  requirement  that  the 
valid  derivation  which  the  AST  represents  oe  displayable  as 
such.  Section  0  introduces  the  notion  of  a  Language 


Definition#  therein  an  R-ARGOT  grammar  is  translated  into  an 
ordered  collection  of  transformation  templates  and  display 


schemas  which  serves  as  the  basis  for  the  construction  and 
di sol  ay  of  an  AST. 

0.  TRANSFORMATIONS 

1 .  Operators  and  Rulenames 

Figure  2  is  the  result  of  precisely  defining  the 
leaves  produced  by  each  of  the  transformations  defined  in 
Chapter  II. 

A  simple  change  in  notation  produces  Figure  3# 
wherein  every  rulename  in  a  t rans f ormat i on  is  associated 
with  an  operator  to  form  a  two-oart  label#  as  follows: 

<  r>  =  NT  »  r 

copt(r)  =  C0PT#r 
iopt(r)  =  IOPT#r 
1  opt ( r )  =  LQPT  #  r 
pdf (p)  =  PDF (p) #o 

where  r  is  any  grammar  rulename  and  p  is  any  predefined 
rulename.  The  first  part  of  a  label#  the  operator,  will 
guide  future  transformations.  The  second  part#  the 
rulename#  serves  as  a  reference  to  that  section  of  the 
1 anquage-spec i f i c  data  base  containing  the  information 
required  for  oerforming  transformations  or  display.  In 
other  words#  labels  may  be  thought  of  as  a  se 1 f-modi f y i ng 
"program*  for  the  Grammar  Directed  Editor  stored  in  the 


hierarchical  AST  structure  by  previous  versions  of  toe  pro¬ 
gram#  encoding  aH  of  the  information  necessary  for  suose- 
quent  modifications  or  display  of  the  structure. 

Note  that  as  a  result  of  the  notational  convention 
adopted  here  that  the  set  of  possible  labels  is  finite  over 
a  finite  set  of  grammar  rules  and#  therefore#  the  set  of 
templates  required  for  such  a  grammar  is  also  finite. 
Further#  the  tyoe  of  transformation  which  may  be  applied  to 
a  given  node  is  determined  entirely  by  the  operator  and  rule 
type  association  stored  within  that  node. 

The  alternation  and  predefined  transformations 
present  a  problem#  however:  although  the  "NT"  opcode  is 
usually  stored  in  transient  nodes#  these  two  particular 
transformations  must  be  stored  in  free  nodes.  The  alterna- 
t i on  requires  that  the  user  select  one  of  the  possible 
alternatives#  and  the  predefined  functions  require  that  the 
user  input  a  string  which  they  then  process.  This  irregu¬ 
larity  is  resolved  by  the  introduction  of  two  new  operators 
ALT  and  TERM  and  the  following  pairs  of  transformations: 


NT, a 

=>  AL  T  #  a 

ALT#a 

s>  {  N T , r l  } 

NT  #  o 

=>  TERM, p 

TERM#  p 

s>  POF (p ) #  p 

The  operators  "ALT 

"  and  "TERM" 

c  all y  equi val ent 

to  "NT",  but 

display  purposes) 

the  nodes 
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purposes).  Figure  4  reflects  these  modifications  to  tne 
general  t ransf ormat i on  table. 

The  introduction  of  the  two  new  labels  ALT, a  and 
TERM,p,  *hile  not  altering  the  leaves  produced  by  the  origi¬ 
nal  transformations  and  thus  not  violating  the  validity  of 
the  mathematical  model's  results  to  systems  based  on  this 
extension,  orovide  the  following  benefits: 

a.  The  format  for  tne  five  defined  tyoes  of  tem¬ 
plate  sets  is  more  regular.  At  least  two  transformations 
are  associated  with  each  rule  type.  The  first  of  these 
transformations  is,  in  every  case,  a  required  transforma¬ 
tion.  The  second  and  following  transformations  require  some 
form  of  interaction  with  the  user. 

b.  Every  node  whose  label  has  an  "NT"  operator  may 
be  automatically  expanded  during  the  autoscan  process. 
Thus,  after  autoscan,  the  only  leaves  whose  labels  contain 
the  "NT"  operator  will  be  those  corresponding  to  undefined 
rules. 

c.  Since  for  every  unique  label  there  is  one  and 
only  one  t ransf ormat i on  possible,  no  contextual  information 
need  be  extracted  from  the  AST  in  order  to  select  and  per¬ 
form  the  correct  t ransf ormat i on.  This  simplifies  the  tasks 
both  of  I anquage  implementation  as  well  as  AST  formation 
since  production  and  invokation  of  a  transformation  template 
is  independent  of  any  AST  contextual  cons i derat i ons  . 
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2 .  T rans f prmat i on  Restrictions 

The  transformations  as  discussed  so  far  define  only 
the  leaves  of  a  possible  forest  of  siblings  which  are  to 
replace  a  particular  node  of  the  AST.  ,ie  now  turn  our 
attention  to  designing  the  interior  structure*  if  any#  of 
the  forests  generated  by  the  transformation  templates.  In 
the  absence  of  other  design  goals  or  res t r i c t i ons #  the  driv¬ 
ing  motivation  in  determining  the  forest  structure  is  to 
obtain  as  much  simplicity  and  economy  of  soace  as  possible. 
These  goals  must  be  balanced  with  the  necessity  to  retain 
semantic  or  schematic  information  to  preserve  the  valid 
derivation  property#  as  well  as  to  retain  sufficient  struc¬ 
tural  information  so  that  insertion  and  deletion  editing 
functions  may  be  convenient  for  the  user  as  well  as  effi¬ 
cient  algorithmically.  The  requirement  to  be  able  to  delete 
synthesized  subtrees  turns  out  to  constrain  the  template 
structures  such  that  the  other  qoals  are  also  met. 

In  order  to  recover  gracefully  from  erroneously  con¬ 
structed  portions  of  the  AST#  the  user  should  have  the  capa¬ 
bility  to  delete  any  node  in  the  AST#  which*  as  for  any 
hierarchical  structure#  inevitably  involves  the  ability  to 
delete  any  subtree.  The  valid  derivation  property  of  the 
AST  requires  that  deletion  of  a  subtree  from  an  AST  be  real- 
i zed  as  the  replacement  of  the  entire  subtree  by  a  node 
which  can  validly  derive  that  subtree  and  which  also  forms  a 
valid  derivation  with  the  remainder  of  the  AST.  The  choice 


of  the  transformation  to  be  apolied  to  a  node  in  the  AST  is 


based  solely  on  the  information  contained  in  the  node  itself 
and  is  completely  independent  of  the  node's  context.  There- 
fore#  deletion  of  a  subtree  must  be  eaui valent  to  replace¬ 
ment  of  that  subtree  by  a  node  witn  the  same  label#  that  is# 
the  same  operator  and  rulename#  which  the  node  which  was 
expanded  to  form  the  deleted  subtree  contained  when  the  node 
was  oriqinallv  created.  The  constraints  provided  by  the 
abstract  model  of  Chapter  II  are  not  sufficient  to  guarantee 
that  this  can  be  consistently  and  efficiently  accomplished. 
For  example*  consider  a  grammar  which  has  only  concatenation 
rules*  each  of  which  is  entirely  either  nonterminal  symools 
or  terminal  symbols.  Since  the  model  allows  the  definition 
of  tempi ates  for  concatenation  rules  which  have  no  terminal 
symbols  without  a  head  node#  the  tree  derived  from  such  a 
grammar  could  be  a  string  tree*  containing  no  information 
for  reconstructing  a  node  being  considered  for  deletion. 
The  only  action  possible  for  a  deletion  algorithm  in  this 
case  would  be  to  delete  the  entire  tree.  However#  consider 
the  effect  of  the  following  proposed  restrictions: 

a.  All  immediate  children  of  a  (necessarily  oound) 
node  must  be  created  by  the  transformations  of  the  rule  by 
which  their  father  was  bound. 

b.  when  a  node  is  bound*  the  rule  whose  transforma- 
t i on  bound  the  node  is  permanently  recorded  in  the  node. 
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c.  A  given  transformation  may  generate  two  or  more 
childless  siblinos#  or  a  subtree  of  the  current  node#  but 
not  both. 

d.  If  a  subtree  is  created  bv  a  t rans f ornat i on »  it 
is  limited  to  at  most  a  single  generation  of  children  and 
may  consist  of  a  single  node. 

Given  these  restrictions#  the  rule  (and  therefore# 
at  worst#  a  choice  between  two  transformation  templates) 
which  originally  created  any  qiven  node  in  the  AST  can  be 
identified  by  examining  its  father.  Computation  on  the 
father  rule  templates  allows  retrieval  of  the  unique  node 
from  which  the  subtree  to  be  deleted  was  formed.  This 
uniqueness  is  further  discussed  below. 

3.  Transf ormat i on  Templates 

Given  the  restrictions  developed  in  the  previous 
section#  we  are  prepared  to  define  the  forests  produced  ov 
each  of  the  eleven  transformations.  The  notation  utilized 
in  the  t ransf ormat i on  templates  below  is  defined  in  Appendix 
C. 

a.  Concatenation 

Rule: 

C  S  xl  *2  ...  »n  #  xk  =  {  rlc  {  **  C"r*c**l  **  !  tk  > 
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Temp) ate: 


headop/C  (  *  <  NT,rk  if  xk  =  rk 


i  COPT | rk  if  xk  =  "t"rk"l"  1 

N  •  N 
t 

...  )  if  for  some  k , 

NT,C  => 

*k  =  {  rk  !  "  l"rk"l -  > 

headop»c 

if  for 

all  k ,  x k  in  T 

headoo 

=  <  HEAD  ! 

predefined  function  > 

There  are  six 

cases  to 

be  considered  in  the 

t rans f ormat i on  to  be  applied  to  the 

1 abel  NT , c : 

nontermi nal s 

t  erm i na 1 s 

comment 

Case 

l:  0 

MO 

undefined  rule 

Case 

2:  1 

NO 

useless  production 

Case 

3:  >1 

NO 

head  reauired  by  delete 

Case 

a:  o 

YES 

t erm i na 1 s  only 

Case 

5:  l 

YES 

head  reauired  oy  mode) 

Case 

6:  >1 

YES 

head  required  by  mode) 

Casa  1  corresponds  to  the  undefined  rule  wherein 
no  righthand  side  of  the  rule  exists.  The  undefined  rule 
t rans f ormat i on  is  discussed  below. 

In  eases  3,  5,  and  6  it  is  required  that  a  head 
node  be  created,  in  cases  5  and  6  by  the  mathematical  mode) 
for  the  retention  of  terminal  information  ana  in  all  cases 
bv  the  restrictions  defined  for  the  deletion  alaorithms.  In 
each  case  the  head  node  replaces  the  nonterminal  under 
transformation  and  the  nonterminal  and/or  optional  children 
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are  realized  as  the  immediate  children  of  the  head  node 


In  case  4  a  head  node  retaining  the  terminal 
information  reolaces  the  nonterminal  being  transformed. 
Since  there  are  no  nonterminals  in  the  grammar  role  for 
which  this  form  of  this  transformation  is  utilized,  no  chil¬ 
dren  are  created.  Note  that  this  node  is  bound  since  it  is 
transformed  into  a  node  which  is  not  one  of  the  label  forms 
for  which  t ransformat i ons  are  defined?  in  fact,  this  is  the 
only  bound  leaf  node  form  generated  outside  the  realm  of 
predefined  functions. 

Case  2  is  the  useless  production.  vte  could, 
without  violating  any  of  the  restrictions  thus  far  imposed, 
define  this  case  of  this  transformation  as  a  single  node 
replacement,  i.e.,  as  NT,c  =>  NT,r,  thus  avoiding  the  crea¬ 
tion  of  a  head  node  carrying  no  information.  However,  we 
see  the  useless  production  as  a  very  rare  and  usually 
unnecessary  occurrence  which  does  not  justify  the  increased 
algorithmic  complexity  required  for  its  detection.  There¬ 
fore,  it  is  treated  in  the  same  manner  as  cases  3,  5,  and  6. 
Implicit  Template: 

COPT , r  ->  NT ,  r 

This  label  must  be  accompanied  by  some  form  of 
user  attention  in  order  that  the  t ransformat i on  be  invoked, 
the  nature  of  which  is  discussed  in  the  next  section. 
Assuming  for  the  moment  that  the  user  has  elected  to  take 


the  ootion,  the  transformation  applied  is  a  sinqle  node 
replacement  wherein  the  operator  COPT  is  overwritten  with 
NT#  and  the  rulename  remains  unchanqed. 

Note  that  the  rulename  in  the  COPT  label  may  oe 
any  of  the  six  rule  types/  including  undefined/  which  raises 
the  question  of  where  to  store  the  template  for  this 
transformation.  The  solution  is  to  make  this  transformation 
implicit/  that  is#  to  apply  the  transformation  without  an 
explicit  template  being  stored  in  the  grammatical  data  base. 
This  mav  be  done  since  the  transformation  is  invariant  over 
all  rules  in  any  grammar/  deoendinq  only  on  the  requisite 
user  attention  and  the  COPT  operator, 
b.  Alternation 

Rule: 

a  :  r 1  "1"  r 2  -1"  ...  "!•  rn  "}" 

Template  t: 

NT/a  s>  ALT/a 

The  t rans f ormat i on  for  the  label  NT/a  is  a  sin¬ 
gle  node  replacement;  the  operator  NT  is  replaced  with  ALT. 
and  the  rulename  remains  unchanged. 

Template  2: 

NT/rk  if  user  input  valid 

ALT, a  => 

ALT, a  otherwi se 

This  label  must  be  accompanied  by  user  input 
indicating  which  of  the  alternatives  is  desired,  suppose  for 
the  moment  it  is  the  kth.  The  transformation  applied  is  a 


single  node  replacement  wherein  the  operator  ALT  becomes  NT 
and  the  alternation  rulename  is  overwritten  with  the 
rulename  of  the  kth  alternative.  If  the  user  input  does  not 
correspond  to  any  of  the  alternatives#  the  t rans f ormat i on 
returns  the  node  unchanged, 
c.  Iteration 

Rule: 

i  :  r 

Template  1: 

NT#  i  =>  ITER# i  (  NT  #  r  }  IOPT#i  ) 

While  not  required  by  the  mathematical  model#  a 
head  node  is  created  by  the  transformation  for  the  label 
NT, i  to  fulfill  the  deletion  reoui rement s .  The  two  leaves 
specified  by  the  model  are  formed  as  the  immediate  children 
of  the  head  node  in  which  the  operator  NT  was  replaced  bv 
ITER.  A  side  effect  of  the  invariant  creation  of  a  head 
node  is  that#  while  inconsistent  with  the  model#  terminal 
information  applicable  to  every  real  child  in  the  iteration 
sibling  string#  as  opposed  to  the  trailing  IQPT  child#  could 
be  included  in  the  iteration  rule  if  an  appropriate  exten¬ 
sion  were  made  to  the  R-ARGOT  notation. 

Temolate  2: 

IOPT  # i  s>  NT  #  r  ;  I0PT#i 

Triggered  by  the  appropriate  user  input#  the 
t ransf ormat i on  for  the  label  I0PT#i  replaces  the  node  with  a 
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pair  of  sidings  which  are  the  leaves  required  by  the  model. 
Note  that  the  rulename  in  the  I0PT  laoel  is  the  same 
rulename  which  bound  its  father.  Thus#  all  children  of  the 
ITER  node#  whether  formed  when  the  ITER  node  was  bound  or 
subsequently  when  the  IOPT  node  was  expanded#  are  formed  by 
one  of  the  transformations  under  the  rulename  stored  in  the 
ITER  node#  as  required, 
d.  List 

Rule: 

1  :  rl  x  #  x  s  <  r2  |  H["r2"l"  1  t  1 

Tempi  ate  1 : 

NT # 1  s>  LIST, 1  (  NT#  r 1  ;  L0PT,J  ) 

The  t ransformat i on  for  the  label  NT,1  replaces 
the  operator  NT  with  the  operator  LIST,  forming  a  head  node 
as  required  by  the  model  in  the  case  the  second  riqnt-hand- 
side  argument  of  the  grammar  rule  is  a  nonterminal  and  in 
every  case  bv  the  deletion  requirements.  The  required 
leaves  form  a  sibling  string  under  the  LIST  node. 

Template  2: 

NT #  r2  ;  NT , r  1  ;  L0PT,1  i f  x  =  r2 

LOPT,?  s>  COPT  #  r2  #'  NT  #  r  1  ;  LO'T#  if  x  s  "[-r2"lM 
NT  #  r 1  ;  LOPT , i  if  x  =  t 

The  t ransf ormat i on  for  this  label  has  three 
forms,  as  indicated#  for  the  three  possible  cases.  In  all 
cases#  the  LOPT  node  being  transformed  is  replaced  with  a 
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sibling  string  as  sho wn»  the  nodes  of  whicn  are  the  required 
leaves*  As  in  the  IOPT  t rans format i on *  the  LOPT  label  car¬ 
ries  the  sane  rulenare  as  its  father  so  that  all  children 
created  under  a  LIST  head  node  are  derived  from  a  common 
parent  rule. 

e.  Predefined 

Rule: 

p  :  pdf 

Template  1: 

NT  *  d  =>  TERM,p 

The  t ransf ormat i on  for  the  label  NT*p  is  a  sin¬ 
gle  node  replacement*  the  NT  operator  being  overwritten  with 
TERM  and  the  rulename  remaining  unchanged. 

Template  2: 

PDF (p* st r i ng) , p  if  PDF (p, st r i ng)  valid 

TERM,o  => 

TERM*p  otherwise 

The  label  TERM,p  must  be  accompanied  bv 
appropriate  user  input  before  the  t rans f ormat i on  is  applied. 
The  enact  nature  of  the  transformation  applied  is  dependent 
upon  the  predefined  rulename*  but  certain  characteristics  of 
the  t rans format i on  may  be  generalized.  The  transformation 
results  in  either  a  single  node  replacement  or  a  possibly 
many-leveled  subtree*  it  may  not  generate  siblinas  or  a 
forest  of  siblings.  As  regards  the  deletion  restrictions* 
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the  subtree  created  by  a  predefined  function  is  considered  a 


single  unit  for  editing  purposes  that  is  not  suDject  to 
internal  deletions  or  insertions.  System  provided  prede¬ 
fined  rules#  if  the  input  is  valid#  invariably  result  in  a 
bound  node  or  subtree  of  bound  nodes;  a  free  node  in  the 
subtree  would  imply  knowledge  of  language-specific  grammar 
rules  which  no  general  purpose  predefined  function  could 
have.  User-supplied  predefined  functions#  allowable  as  a 
language-specific  extension  to  the  system#  may  admit  such 
free  nodes#  however#  the  language  implementor  is  responsible 
for  ensuring  the  syntactic  integrity  of  the  AST  is  preserved 
over  such  t rans f ormat i ons . 

If  the  input  accompanying  the  label  is  rejected 
by  the  predefined  function#  the  transformation  is  null  and 
the  node  is  unchanged. 

f.  Undefined 
Implicit  Template: 

NT  #  u  s>  NT  #  u 

The  undefined  label  undergoes  a  null.  implicit 
t  ransf ormat i on . 

<4 .  User  Attention 

Of  the  eleven  t rans f ormat i ons #  six  define  the  action 
to  be  taken  for  the  six  possible  nonterminal  labels.  The 
remaining  five#  the  second  t ransformat i on  template  for  each 
of  the  five  defined  rule  types#  all  require  some  form  of 
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user  attention  orior  to  the  application  of  the  specified 
template.  The  form  of  user  attention  required  is  dependent 
upon  the  operator  but  generally  may  be  characterized  as  con* 
sisting  of  two  parts:  an  indication  that  the  user  wishes  to 
direct  attention  to  the  current  node*  and  a  Possibly  empty 
character  string  utilized  by  the  transformation  as  an  input 
parameter.  The  five  t rans f ormat i ons  reouirino  user  atten¬ 
tion  fall  into  three  classes#  as  follows: 

a.  IOPT,  COPT,  LOPT 

The  three  optional  operators  require  simply  that 
the  user  elect  to  expand  the  optional  node.  Thus  directing 
attention  to  an  optional  node  is  sufficient  for  application 
of  the  template  and  the  character  string  parameter  is  not 
requi red. 

b.  ALT 

The  Alternation  operator  requires  that  the  user, 
after  directing  attention  to  the  alternation  node,  provide  a 
character  to  be  utilized  in  determining  which  of  the  possi¬ 
ble  alternatives  is  desired. 

c.  TERM 

The  TERM  operator  requires,  in  addition  to  the 
user's  attention,  a  character  string  for  processing  by  the 
predefined  rule  associated  with  the  node. 

The  exact  format  of  the  user  attention  parameter 
is  implementation  dependent,  but  is  summarized  abstractly  as 
follows,  by  operator: 


80 


ooerator 

user  attention 

COPT 

<el ect 

opt i on> 

I  OPT 

<el ect 

opt i on> 

LOPT 

<e l ec  t 

opt i on> 

ALT 

<char> 

TERM 

<st  r i ng> 

De 1 et i on  and 

Inser t ion 

Earlier  it  was  asserted  that  templates  defined  in 
accordance  with  an  aooropriate  set  of  restrictions  would 
allow  deletion  of  any  subtree  from  the  AST  using  only  the 
rulename  of  the  subtree's  parent  node.  we  now  verify  that 
assertion  based  on  the  templates  as  defined  above. 

Of  the  si*  rule  types,  three  may  be  excluded  from 
consi derat i on  as  potential  parents  of  nodes  to  be  deleted. 
Undefined  rules  never  form  children  and  thus  are  never 
referenced  for  deletion.  Predefined  rules  are  defined  to 
create  subtrees  which  can  be  edited  only  as  complete  units. 
Alternation  rulenames  never  appear  in  bound  nodes  of  the  AST 
since  the  alternation  rulename  in  a  free  node  is  overwritten 
with  the  rulename  of  the  alternative  rule  chosen.  Thus  only 
concatenation#  iteration#  and  list  rules  remain  as  ootential 
parents  of  subtrees  whose  deletion  is  desired.  The  parent's 
rule  type  in  each  of  these  three  cases  may  be  positively 
identified  by  the  parent  node's  operator:  if  the  operator  is 
ITER,  the  the  parent  rule  is  an  iteration#  if  LIST,  then  it 
is  a  list  rule#  and  if  otherwise  (either  HEAD  or  a 
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predefined  function),  then  the  parent  rule  is  a  concatena¬ 
tion.  The  templates  for  these  three  rule  types  allow 
recreation  of  the  original  label  which  existed  wr»en  the  root 
node  of  the  subtree  to  be  deleted  was  initially  created. 

A  parent  concatenation  rule,  upon  initial  expansion, 
creates  a  fixed  number  of  children,  all  of  the  forms  NT»r 
and  C0PT,r.  8y  inspection,  no  t r ans f o rmat i on  or  sequence  of 
transformations  on  these  labels  for  anv  of  the  six  rule 
types  may  create  additional  siblings  under  the  parent  con¬ 
catenation  rule  nor  may  they  reorder  the  subtrees  initially 
created.  Thus  the  initial  fixed  number  and  order  of  chil¬ 
dren  created  remains  constant.  Suppose  some  subtree,  say 
the  ith,  under  the  concatenation  rule  parent  is  selected  for 
deletion.  The  siblinq  which  was  originally  created  by  tne 
concatenat i on  rule  as  its  ith  child  may  be  reconstructed  by 
traversing  the  concatenation  rule  template  until  the  ith 
sibling  list  element  is  encountered.  This  siblinq  list  ele¬ 
ment  contains  the  information  by  which  the  node  replacing 
the  subtree  to  be  deleted  may  have  its  operator  and  rulename 
fields  re i ni t i al i ^ed.  Deletion  of  a  subtree  under  an  itera¬ 
tion  ru  parent  node  is  made  possible  by  the  consistent 
manner  in  which  the  two  iteration  rule  templates  create 
children  of  the  parent  node.  The  first  child  is  created  by 
the  first  template  and  the  deletion  process  for  the  first 
lubtree  is  similar  to  concatenation  deletion.  Subsequent 
subtrees,  up  to  the  trailing  IQPT,i  node,  are  created  by  the 
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second  template  and  the  information  necessary  to  recreate 
any  label  mav  be  retrieved  from  the  first  siblina  list  ele¬ 
ment  of  that  template.  The  IOPT,i  child  is  invariant  in 
location  and  form  and  is  not  subject  to  deletion. 

Deletion  of  the  first  subtree  under  a  list  rule 
parent  is  handled  in  the  same  manner  as  the  first  subtree 
under  an  iteration  oarent.  Subsequent  subtrees*  up  to  the 
L0PT,1  node*  are  also  similar  to  iteration  rule  suDtrees 
except  that  they  may  have  been  created  in  pairs.  Examina¬ 
tion  of  the  list  rule's  second  template  will  reveal  whether 
subtrees  after  the  first  must  be  treated  in  pairs  or  mav  be 
handled  singly.  In  either  event*  the  information  necessary 
to  recreate  any  given  child  is  available  in  the  template. 
The  L0PT*1  child  is  not  subject  to  deletion. 

So  far  deletion  has  been  concerned  only  with 
"unparsinq"  an  incorrectly  formed  subtree  to  a  single  ances¬ 
tor  node  so  that  the  subtree  may  be  correctly  reconstructed. 
For  subtrees  of  concatenation  rules  this  is  the  only  form  of 
deletion  which  retains  the  valid  derivation  property.  Sup- 
trees  of  iteration  rules*  however*  are  all  derived  from  the 
same  label  and  thus  are  all  syntactically  equivalent  when 
viewed  from  their  root.  Further*  the  only  restriction  on 
the  number  of  iteration  rule  node  subtrees  is  that  there 
must  be  at  least  one  in  addition  to  the  IOPT  node.  Thus* 
deletion  of  an  iteration  rule  subtree*  excepting  throughout 
the  trailing  IOPT  node*  could  be  realized  as  the  actual 
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physical  deletion  of  the  entire  suotree  including  the  root 


node#  as  ionq  as  at  least  one  subtree  remains.  As  a  corol¬ 
lary#  a  node  oroperly  labelled  in  accordance  with  the  itera¬ 
tion  parent  rule  could  be  inserted  in  front  of  any  node  in 
the  iteration  sibling  string  without  violating  the  valid 
derivation  property.  The  insertion  procedure  requires  the 
same  information  as  deletion#  the  rule  tyoe  and  rulename  of 
the  parent  node#  in  order  to  construct  an  appropriately 
labelled  node  for  insertion  into  an  existing  iteration  node 
sibl ing  strinq. 

List  rules  whose  second  argument  is  a  terminal  sym¬ 
bol  form  AST  structures  equivalent  to  iteration  constructs 
and  thus  ohysical  deletion  (as  opposed  to  unparsing  to  a 
single  node)  as  well  as  insertion  are  valid  operations. 
List  rules  in  general  present  a  more  complicated  oroblem  in 
that  subtrees  after  the  first  are  formed  in  pairs.  However, 
extending  the  argument  concerning  syntactic  equivalence  of 
subtrees  to  pairs  of  subtrees  is  straightforward  and  allows 
physical  deletion  and  insertion  to  apply  to  list  rule  sub- 
t  rees  as  well. 

In  summary#  deletion  is  realized  as  a  replacement 
operation  for  all  concatenat i on  rule  subtrees  and  for  soli¬ 
tary  iteration  and  list  rule  subtrees#  wherein  the  subtree 
to  be  deleted  is  replaced  by  a  single  node  which  is  a  recon¬ 
struction  of  the  subtree's  initial  state.  Under  iteration 
and  list  parents  where  other  subtrees  exist#  deletion 


84 


I 


results  in  the  physical  removal  of  the  suotree  or  subtree 
pair;  reconst  rue t i on  may  be  accomplished  at  the  same  or  some 
other  location  under  the  parent  by  a  separate  insertion 
ooerat i on . 


C.  DISPLAY  SCHEMAS 


Thus  far  a 

method 

of  constructing 

an  AST 

has 

been 

devel oped 

util 

i  z  i  ng 

transformations  to 

expand 

nodes 

i  n 

accordance 

with 

a  set 

of  templates  sorted 

Dv  rulename 

such 

that  the  AST  represents  a  valid  derivation  of  the  associated 
grammar.  Attention  is  now  focused  on  displaying  the  AST;  in 
particular^  a  method  is  developed  in  this  section  oy  which 
the  valid  derivation  of  the  grammar  which  the  AST  represents 
may  be  displayed. 

Oisplay  of  the  AST  is  the  result  of  a  generalized 
inorder  traversal*  beginning  with  the  root  node*  with  termi¬ 
nal  and  nonterminal  symbols  being  displayed  in  accordance 
with  schemas  associated  with  each  label.  The  display  need 
not  be  strictly  preorder  since  provision  is  made  to  display 
subtrees  under  a  parent  node  in  any  order  as  directea  by  the 
parent's  rule  schema.  This  capability  is  provided  to  allow 
for  the  case  where  the  evaluator  may  have  to  access  the  sub* 
trees  in  a  different  order  than  that  implied  by  the  syntax 
of  the  target  language. 

Schemas  are  referenced  by  the  rulename  associated  with 
each  bound  and  free  node  in  a  manner  similar  to  the 
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referencing  of  templates  so  that  the  display  associated  with 
a  subtree  is  independent  of  the  context  of  that  subtree. 

The  valid  derivation  need  not  oe  disolayed  in  its 
entirety.  For  example*  the  means  is  provided  to  display  all 
undefined  nonterminals  as  they  occur  in  the  AST  as  part  of 
the  valid  derivation.  If  the  language  implementor  chooses* 
however*  he  may  elect  to  not  display  any  of  the  undefined 
nonterminals  which  appear  in  a  partial  grammar  he  is  imple¬ 
menting  in  its  incomplete  state. 

In  the  following  two  sections*  first  the  schema  language 
is  defined  and  then  the  formation  of  schemas  for  each  of  the 
ruletypes  is  developed. 

1 .  Schema  Language 

There  are  three  types  of  display  information  pro¬ 
vided  for  in  the  schema  language:  format  control*  literal 
strings*  and  subtree  indicators.  A  system  for  handling  com¬ 
ments  has  not  yet  been  developed.  However*  it  is  envisioned 
as  an  extension  to  the  schema  language  and  not  as  part  of 
the  grammar  for  the  tarqet  language. 

Format  control  information  is  encoded  mneumon  i  c a  1  1  y 
in  the  double  capi tal -1 etter  strings  "NL"*  "TB",  and  "UT", 
interpreted  respectively  as  "newline"*  "tab"*  and  "untab". 
UT  simply  causes  a  variable*  "tabcount"*  to  be  decremented. 
T8  causes  a  tab  control  character  to  be  transmitted  to  the 
output  device  and  increments  "tabcount".  NL  causes  a  new¬ 


line  character  and  "tabcount"  tabs  to  be  transmitted  to  the 


output  device.  Format  control  information  is  provided  for 
readabi 1 i ty  only. 

Literal  strings  are  arbitrary  character  st  r i nqs  * 
delimited  by  double  quotes*  that  are  transmitted  directly  to 
the  output  device.  Literal  strings  provide  the  mechanism 
for  the  display  of  terminal  and  nonterminal  symbols  in  tne 
derivation  represented  by  the  AST, 

A  subtree  indicator*  denoted  by  a  dollar  sign  fol¬ 
lowed  by  an  integer  interpreted  as  a  child  number,  directs 
that  that  subtree  be  entirely  displayed  prior  to  resumption 
of  display  of  the  current  schema.  An  optional  display 
field*  consisting  of  an  eauals  sign  followed  by  a  literal 
string*  may  accompany  the  subtree  indicator  to  Provide  the 
means  for  displaying  undefined  nonterminals*  the  three 
optionals*  and  TERM  nodes*  as  described  in  the  followinq 
paragraphs . 

An  undefined  nonterminal  may  appear  for  a  variety  of 
reasons*  the  most  common  being  as  a  placeholder  in  a  partial 
grammar.  Since  the  rule  for  the  nonterminal  does  not  exist* 
there  can  be  no  schema,  so  the  optional  field*  if  provided* 
is  invariably  utilized.  If  not  provided,  nothinq  will  be 
displayed  for  the  undefined  nonterminal. 

The  three  optional  nodes,  CHPT,  IOPT,  ana  LOPT* 
require  special  handling  since  there  is  nothing  inherently 
•optional”  about  a  rule.  Rather*  t..e  optional  nodes  are 
placeholders  to  indicate  to  the  user  the  possibility  that 
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the  rule  specified  may  be  invoked#  if  the  user  so  chooses# 


but  also  may  be  left  uninvoked  in  a  "complete"  AST.  Since 
it  is  the  father  rule  which  holds  the  information  that  this 
rule  invocation  may  be  an  as  yet  unelected  option#  the 
father  rule  schem  ,  contains  the  information#  in  the  form  of 
an  optional  display  field#  to  display  the  noae  accordingly. 

The  predefined  rule  referenced  by  a  TERM  node  is  in 
general  a  1 anguage-i ndependent  system  routine.  As  such,  it 
has  no  knowledge  of  the  nonterminal  name  which  it#  when 
invoked  by  the  user  on  a  string#  is  replacing  in  the  valid 
derivation.  Since  the  father  rule  does  have  this  informa¬ 
tion#  the  father  rule  schema  contains  the  optional  display 
field  necessary  to  properly  display#  within  the  context  of 
the  grammar#  the  rulename  which  the  predefined  rule  will 
replace.  In  other  words#  this  facility  allows  the  language 
implementor  to  rename  the  predefined  rule  tor  display  pur¬ 
poses  . 

when  an  option  has  been  elected  or  a  TER'*  node 
predefined  rule  has  produced  a  bound  node#  both  of  which  are 
disolayable  in  their  own  right#  the  optional  field  associ¬ 
ated  with  the  subtree  indicator  is  no  longer  necessary  and 
will  be  ignored  by  the  display  algorithm.  While  these  nodes 
remain  free#  however#  the  optional  disolay  field  provides 
the  user  the  information  he  needs  to  expand  these  nodes#  as 
well  as  a  logical  symbol  under  which  the  GDE  may  place  the 
cursor  to  indicate  the  current  node. 


A  subtree  indicator  which  may  reference  one  of  the 


three  node  tyoes  discussed  above  must*  in  order  that  a  valid 
derivation  be  displayed/  include  an  appropriate  optional 
display  field.  The  implementor  may/  of  course/  omit  such  a 
display  field  in  which  case  nothing  will  be  displayed  for 
the  node.  In  the  case  of  an  undefined  nonterminal  this  mav 
be  the  most  pleasing  result;  in  the  case  of  ootionals  and 
TERM  nodes  such  a  display  will  not  accurately  reflect  all 
free  nodes  in  the  AST  that  may  be  of  interest  to  the  user. 
The  ommissfon  of  such  an  optional  display  field  may  be 
regarded  under  normal  circumstances  as  a  mistake  in  the 
language  definition. 

2.  Rule-Specific  Schemas 

Construction  of  schemas  is  a  straight-forward  pro¬ 
cess  when  keyed  to  rule-type  since  the  schema  subtree  indi¬ 
cators  and  literal  strings  must  conform  to  both  the  R-ARGOT 
grammar  rule  definition  and  to  the  transformation  templates 
associated  with  the  rule  definition  in  a  consistent  way.  In 
the  schema  constructions  which  follow/  format  control  infor¬ 
mation  is  ignored/  but  generally  may  be  inserted  into  a 
schema  any  place  that  a  terminal  symbol  is  allowed, 
a.  Concatenation 

Rule: 

c  :  *1  x2  ...  xn  /  xk  s  <  rk  J  "t"rk"J"  J  tk  > 


«9 


i 


Schema 


cs  :  si  s2  ...  sn  # 

"tk"  i f  xk  s  tK 

I j t ru 1  enamel "  if  child  j  is  optional 

sk  s  S j s"<ru l ename>"  if  child  j  is  predefined 

S j s" ( ru 1 ename ) "  if  child  j  is  undefined 

?j  otherwise 

A  single  schema  is  required  for  the  concatena¬ 
tion  rule  and  may  be  constructed#  if  all  nonterminals  are 
realized  as  children  in  the  order  they  are  listed  in  the  R- 
ARGOT  rule#  as  follows: 

Reading  the  R-ARGQT  concatenation  rule  from  left  to 
right#  for  each  symbol  xk: 

if  xk  is  a  terminal  symbol#  copy  it  to 
the  schema  as  a  literal  string# 
if  xk  is  the  jth  nonterminal  and  is  optional# 
write  S J="  Irul enamel "  to  the  schema? 
if  xk  is  the  jth  nonterminal  and  is  predefined# 
write  Sj ="<rul ename>*  to  the  schema# 
if  xk  is  the  jth  nonterminal  and  is  undefined# 
write  S j s" ( rul ename ) "  to  the  schema, 
if  xk  is  the  jth  nonterminal  symool#  and  is 
not  optional#  undefined#  or  a  predefined 
rule#  write  Sj  to  the  schema# 


This  algorithm  for  the  construction  of  a  con¬ 
catenation  schema  is  for  the  display  of  the  entire  valid 
derivation.  If  disolay  of  an  undefined  nonterminal#  *or 
example#  is  not  desired#  the  subtree  indicator  for  that 
child  could  either  be  written  without  the  optional  display 
field  or  be  omitted  entirely.  while  this  algorithm  assumes 
that  the  implementor  wrote  the  concatenation  template  such 
that  the  children  correspond  in  order  to  tne  nonterminals  in 
the  rule#  this  need  not  be  the  case.  The  schema  must  know 
the  order#  however#  so  that  the  disolay  is  an  accurate 
representat i on  of  the  derivation  obtained  from  the  grammar. 

As  an  example  of  each  of  the  possibilities 
listed  above#  consider  the  concatenat i on  rule 

simple  :  "program"  name  dec  1 s  lexternsl  block  "end"  . 
where  the  nonterminal  "name"  refers  to  a  predefined  func¬ 
tion#  "decls"  is  an  undefined  nonterminal#  and  "block”  is  a 
well  defined#  non-oot i ona 1 »  non-predef i ned  nonterminal.  The 
schema  for  this  rule#  without  any  format  control  characters# 
would  be 

"program" $ls"<name>"S2s" Cdecls)HS3=" lexternsl "$«"end" 
b.  Alternation 

Rule: 

a  :  "("  char  1 : x 1  "I"  char2:x2  "!"  ...  "!"  charnixn  ">" 
Schemas: 

asl  :  "{alternation  rulenamel" 

as2  :  "<  char  1 : rul ename 1  '  ...  !  charn: rul enamen  >" 
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Since  the  t rans f ormat i ons  defined  for  an  alter¬ 
nation  rule  are  both  single  node  replacements)  the  second 
one  of  which  results  in  the  alternation  rulename  being 
overwritten#  it  is  clear  that  no  semantic  or  schematic 
information  required  in  a  sentence  in  the  languaqe,  as 
ooposed  to  a  valid  derivation  in  general#  may  be  associated 
with  the  schema  for  an  alternation  rule  since  once  the 
alternative  choice  is  made  By  the  user,  the  rulename  and 
thus  access  to  the  schema  is  no  longer  present  in  the  AST. 
Thus  the  schema  for  an  alternation  rule  could  have  oeen 
implemented  as  a  subtree  indicator  optional  field.  rte 
choose  to  provide  a  pair  of  explicit  display  schemas  associ¬ 
ated  with  the  alternation  rulename,  however,  to  implement  a 
"help"  mechanism.  The  first  display  schema  consists  simply 
of  a  literal  string  comoosed  of  the  alternation  rulename  in 
curly  brackets  and  is  the  schema  normally  used  to  display 
the  node.  The  second,  optional  at  user  request,  is  again 
simply  a  literal  string  but  with  the  alternative  rules  and 
their  associated  keystrokes  displayed  in  curly  brackets. 

For  example,  the  following  alternation  rule 
statement  :  {  atassignment  S  c :condi t i onal  !  biblock  1 
would  be  displayed  normally  by  the  schema 
"<  statement  >" 

or,  if  the  user  desired  to  see  the  alternatives  and  their 
keystrokes,  by 

"{  atassignment  i  c : cond i t i ona 1  !  btblock  >* 


c.  Iteration 


Rule: 


Schemas : 


i s 1  :  SI 


i  s2  :  "(iteration  rulenameJ 


The  iteration  (as  well  as  the  list)  rules  differ 
from  concatenation  in  that  they  may  nave  an  indefinite 
number  of  children  requiring  display.  Since  no  terminals 
are  allowed  in  an  R-ARGOT  iteration  rule  and  since  every 
child  is  formed  independently  of  the  others  in  the  sibling 
strina#  display  of  an  iteration#  while  involving  some  work 
on  the  oart  of  the  display  algorithm  to  traverse  all  of  the 
subtrees  one  at  a  time#  requires  a  oair  of  very  simple  sche¬ 
mas.  The  first  is  simoly  a  subtree  indicator  used  for 
display  of  all  subtrees  except  the  last.  The  subtree  indica¬ 
tor  may  include  an  optional  field  for  undefined  and  ©rede¬ 
fined  rule  displav;  from  the  t ransf ormat i on  template  defini¬ 
tions  it  is  apparent  that  no  child  of  an  iteration  node  can 
be  a  concatenat i on  optional  node.  The  second  schema  is  used 
for  display  of  the  last  child#  invariably  an  IOPT  node. 


d.  List 


Rule. 


1  :  rl  X  "..."  #  X  =  {  r2  !  "I"r2"J"  !  t  > 
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Schemas 


1  s 1  :  St 


$  1  $2 

i  f 

x  -  r  2 

1  s2  : 

Sts"  (ru1ename21  "$2 

i  f 

x  =  " f"r2"l " 

"t"Sl 

i  f 

x  =  t 

ls3  :  "(list  rulenamel" 

The  list  rule  requires  three  schemas  in  order  to 
properly  display  the  unique  format  the  list  structure  con¬ 
veys.  Like  the  iteration  rule,  the  list  may  have  an  inde¬ 
finite  number  of  suotrees;  however#  R-ARGUT  allows  the 
second  argument  to  be  a  terminal  symbol.  Without  this 
facility  the  inclusion  of  the  list  rule  type  is  hardly  jus¬ 
tified  since  the  most  usual  use  of  the  construct  is  to 
separate  grammatical  entities  with  some  punctuation  mark. 

The  first  schema  is  used  for  display  of  the 
first  child.  Subsequent  children  or  pairs  of  children, 
depending  on  the  specific  list  rule,  up  to  the  last  in  the 
sibling  string,  are  displayed  by  the  second  schema.  The 
display  algorithm  must  keep  track  of  which  children  it  has 
displayed  in  traversing  the  list  in  order  that  this  label 
schema  structure  display  the  sequence  of  subtrees  correctly. 
The  third  schema  is  used  for  display  of  the  last  child, 
invariably  an  LOPT  node. 

As  an  example  of  the  list  rule  schemas,  consider 
the  R-ARGOT  rule 


R4 


statements  :  ft  statement  ...  . 

The  schemas  generated  to  display  this  rule  would  be 
)  s 1  :  SI 
is2  :  ";"Si 
l  s3  :  *  (statements] " 

Note  that  a  NL  format  control  character  would  be  appropriate 
after  the  * J "  terminal  in  1 s2  and  before  the  literal  strinq 
in  1 s3  in  oroer  to  Place  each  statement  and  semicolon  pair 
on  a  separate  line. 

e.  Predefined 

A  predefined  display  function  should  accompany 

each  predefined  rule  scanner.  The  display  algorithm  will 

pass  the  subtree  created  by  the  predefined  scanner  to  tne 

named  display  function.  For  example#  the  predefined  scanner 

"id*  will  scan  an  identifier,  place  it  in  the  symbol  taole, 

and  fill  in  the  TER**  node  with  the  information  allowinq 

reference  to  that  symbol  table  entry  for  the  evaluator.  On 

display,  the  routine  "idout"  will  be  called  to  cause  the 

referenced  identifier  to  be  displayed. 

* 

D.  THE  LANGUAGE  DEFINITION  MODULE 

The  Lar uuage  Definition  Module  is  the  grammatical  data” 
base  utilized  by  the  Grammar  Directed  Editor  in  the  con¬ 
struction  and  evaluation  of  an  AST.  The  Language  Definition 
Module  ha#  a  fixed  and  an  interchangeable  component.  The 
fixed  component  consists  of  the  system  predefined  rules  and 
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known  as  the 


Language  Definition,  is  comprised  of  the  1  anguaoe-soec i f i c 
grammar  rules,  templates,  and  schemas.  In  addition,  the 
Language  Definition  may  optionally  include  user-suopl i ed 
predefined  rules  and  functions  supplementing  or  suoerceoing 
those  permanently  installed  in  the  system. 

1 .  The  Language  Definition 

The  primary  component  of  the  Languaqe  Definition  is 
the  internal  represent  at i on  of  the  1 anguage-soec i f i c  grammar 
as  an  ordered  collection  of  grammar  rules  and  their  associ¬ 
ated  templates  and  schemas.  The  Language  Definition,  apart 
from  user-supplied  predefined  rules  and  functions,  consists 
of  a  Rule  Tree  and  a  string  table.  The  string  table  con¬ 
tains  the  character  string  represent  at i on  of  the  templates 
and  schemas  for  each  rule.  The  Rule  Tree  is  the  ordering 
mechanism  for  the  grammar  rules  which  provides  access  to  the 
templates  and  schemas  in  the  string  table.  The  Rule  Tree  is 
a  four-tiered  hierarchy,  the  uppermost  level  of  which  is  a 
head  node  for  the  tree.  The  next  level  consists  of  a 
sequence  of  head  nodes,  one  for  each  defined  grammar  rule. 
Under  each  qrammar  rule  node  is  a  pair  of  head  nodes,  the 
first  for  the  templates  associated  with  the  rule  and  the 
second  for  the  schemas.  The  fourth,  bottom-most  tier  con¬ 
sists  of  leaf  nodes  containing  pointers  to  the  template  and 
schema  strings  stored  in  tne  string  table.  The  regularity 
designed  into  the  template  and  schema  definitions  for  each 
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of  the  rule  types  allows  accessing  any  leaf  of  the  Rule  free 
by  the  Editor  utilizinq  only  the  operator  and  rulename 
information  in  an  AST  node  label. 

Appendix  D  is  an  Intermediate-Level  Language  Defini¬ 
tion  Grammar.  Encoded  by  hand  into  a  Language  Definition  as 
shown  in  Appendix  E*  the  ILD  Grammar  oroviaes  the  means  to 
generate  a  Grammar  Directed  Editor  for  the  construction  of 
ASTs  representing  language-specific  Language  Definitions. 
When  such  an  AST  is  evaluated  by  the  predefined  function 
ILD  *  the  result  is  a  language-specific  Lanquage  Definition 
which  may  be  installed  in  the  Language  Definition  Module  and 
utilized  to  construct  appl icat ions-ori ented  ASTs  in  the 
language  defined  by  the  grammar.  Appendix  F  presents  a  sim¬ 
ple  example  of  such  an  apo 1 i cat i ons-or i ent ed  Language  Defin¬ 
ition  from  which  ASTs  representing  strictly  formatted 
memoranda  may  be  constructed  utilizinq  the  GDE. 

The  ILD  Grammar  allows  definition  of  grammars  on  an 
assembly-language  level*  i.e.*  many  details  which  are  com¬ 
putable  from  the  R-ARGOT  grammar  rule  must  be  entered  by  the 
user.  For  example*  in  the  construction  of  an  iteration  rule 
the  user  is  required  to  enter  "rulenamel"  and  "i-rulename" 
in  a  consistent  manner  throughout  the  formation  of  the  tem¬ 
plates  and  schemas.  However*  at  this  low  level  the  mechan¬ 
isms  for  checking  such  consistency  do  not  exist.  Thus  the 
ILD  Grammar  is  seen  as  a  flexible  Out  error-prone  tool  suit¬ 
able  for  use  primarily  as  a  bootstrap  mechanism  for  the 
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definition  and  i mo  1 ement at i on  of  a  High-Level  Language 
Definition  Grammar  which  automatically  derives  as  much 
information  from  the  R-ARGOT  rule  as  is  possible.  For  aram- 
mars  in  which  all  nonterminal  children  of  concatenat i on 
rules  are  to  be  created  and  disolayed  in  the  order  listed  in 
the  rule*  an  extended  R-ARGOT  notation  which  provided  tne 
facility  for  inclusion  of  format  control  information  and  a 
means  for  soec i f i cat i on  of  predefined  functions  as  head 
nodes  of  concatenat  i  ons  would  allow  such  automatic  deriva** 
tion.  Development  of  such  an  extended  notation  as  well  as 
the  corresponding  HLO  Grammar  and  function  are  deferred 
until  the  symbol  table  and  evaluator  designs  are  complete. 

* 

2 .  Predefined  Rules 

The  set  of  system  predefined  rules  provides  the  user 
a  mechanism  for  entering  strings  representing  simple*  common 
constructs*  such  as  identifiers  and  numbers*  as  well  as  more 
involved  constructs*  such  as  expressions*  which  even  though 
composed  of  many  oarts  and  perhaps  generating  multinode  sub¬ 
trees  in  the  AST*  may  be  most  conveniently  viewed  by  the 
user  as  representing  single  logical  units.  Predefined  rules 
are  built-in,  optional  extensions  to  the  Language  Definition 
which  provide  the  1 anguage  implementor  with  a  set  of  primi¬ 
tives  upon  which  he  may  oase  his  grammatical  constructs. 
The  set  of  predefined  rules  is  modifiable  and  extensible  by 
the  language  implementor  throuqh  inclusion  as  an  adjunct  to 
the  grammar  definition  a  set  of  predefined  rules  which 
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supercede  or  complement  the  set  permanently  installed  in  the 
Lanquage  Definition  Module. 

Predefine d  rules  may  be  viewed  as  a  deviation  from 
the  qrammar  directed  editing  philosophy  espoused  throuqhout 
this  work.  The  use  of  predefined  rules  allows  the  entry, 
after  all,  of  syntactically  incorrect  strings  which  are  not 
immediately,  in  the  sense  of  character-at-a-t ime  immediacy, 
detected  and  rejected  as  invalid.  For  example,  compare  a 
"pure",  charac ter-at -a-t i me  grammar  directed  editor  with  a 
predefined  rule  augmented  GDE  on  the  terminal  <string>, 
defined  for  illustration  to  be  the  concatenation  of  any 
characters  except  a  space,  and  terminated  by  a  carriage 
return.  In  the  pure  system,  each  character  is  examined  and 
its  validity  checked  as  it  is  typed.  In  this  example,  if 
the  user  enters  a  string  of  valid  characters  and  then  a 
space,  he  is  immediately  informed  that  the  space  is  unac¬ 
ceptable  and  is  able  to  proceed  without  retyping  that  por¬ 
tion  of  the  string  thus  far  entered.  The  predefined  rule 
system,  however,  would  require  that  the  entire  string  of 
symbols,  including  the  incorrect  space,  be  entered  before 
rejecting  it,  and  the  user  would  have  to  retype  the 
corrected  string  in  its  entirety. 

We  grant  that  grammar  directed  editing  down  to  the 
smallest  indivisible  unit,  the  character,  has  a  certain 
appeal.  However,  our  predefined  rule  compromise  is 
motivated  by  several  advantages  and  mitigating  arguments: 


l  *  -  -  »,  - _  _ _ _ . _  i 


a.  The  time  1 aose  between  entering  even  a  large 
predefined  rule  incut  string,  such  as  a  complex  expression, 
and  re-entering  it  if  if  is  rejected  as  incorrect,  is  short. 

o.  The  time  lost  in  a  predefined  rule  system  in 
retyping  the  usually  short  input  strings  accepted  by  most 
predefined  rules  is  offset  by  the  time  that  would  oe  lost  in 
a  ours  system  that  requires  control  characters  to  guide  the 
tree  building  via  the  language  definition  through  the  vari¬ 
ous  alternatives  involved  in  the  larger  grammatical  con¬ 
structs,  such  as  expressions,  that  can  easily  be  handled  bv 
predefined  rules. 

c.  The  syntactic  integrity  of  the  AST  is  always 
preserved  by  the  system  predefined  rules  since  no  change  to 
the  AST  is  made  until  the  syntactic  validity  of  the  entire 
input  strinq  is  confirmed. 

d.  Predefined  rules  simplify  the  language 
i mpl ementor ' s  task  by  raising  the  level  of  the  lowest  gram¬ 
matical  constructs  that  must  be  defined  in  the  grammar. 
Instead  of  having  to  work  clear  down  to  the  character  level, 
predefined  rules  provide  as  primitives  the  facilities  for 
handlinq  groups  of  characters,  such  as  numbers,  identifiers, 
and  strings,  which  are  the  basic  building  blocks  of  data 
structures  in  general  and  proqrams  in  particular. 

e.  Given  automatic  lexical  analyzer  and  oarser  gen¬ 
erators,  predefined  rules  for  the  class  of  grammatical  con¬ 
structs  envisioned  are  easil>  built. 


100 


f.  The  suitable  choice  of  predefined  rules  frees 
the  1 anguage  implementor  from  l ong-w i noed  *  needlessly 
detailed  grammatical  constructions  for  a  wide  variety  of 
regul arl y-expressibl e  productions.  Grammars  for  1 anquage 
definitions*  given  such  a  set  of  easily  understandable  prim¬ 
itive  const  rue t i ons *  would  be  more  transparent  and  easier 
for  the  user  to  assimilate. 

It  is  recognized  that  taking  the  predefined  rule 
approach  to  its  extreme  limits  could  result  in  a  compiler- 
like  editor  wherein  huge  segments  are  submitted  for  analysis 
to  exceedingly  complex  predefined  rules*  thereby  negating 
the  benefits  to  be  gained  from  a  more  rational  grammar 
directed  editing  environment.  However,  within  the  guide¬ 
lines  presented  here,  the  predefined  rule  approach  has  dis¬ 
tinct  advantages  and  leaves  open  avenues  for  exploration  to 
the  tanquage  implementor. 

3.  Predefined  Functions 

Nodes  in  the  AST  undergoing  evaluation  faal 1  into 
one  of  three  Categories:  undefined*  head*  and  function.  The 
class  of  undefined  nodes  includes  all  free  nodes  which  may 
still  exist  in  the  AST.  Head  nodes  nodea  are  the  HEAD,  ITER 
and  LIST  operator  nodes  created  for  synthesis  of  the  AST* 
all  of  which  are  synonymous  to  the  evaluator.  Head  nodes 
have  no  computational  capabilities  during  the  evaluation 
process  but  rather  provide  structure  to  the  AST.  Function 
nodes  have  as  their  operator  one  of  the  predefined 
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functions.  Function  nodes  are  generated  by  concatenation 
and  predefined  rules  during  synthesis  of  the  AST  and  result 
in  calls  to  the  corresponding  predefined  function  during 
evaluation.  Function  nodes  may  oe  leaves*  as  in  nodes  which 
reference  symbol  table  entries*  or  they  may  be  interior 
nodes.  If  interior*  function  nodes  must  have  the  number, 
order,  and  tyoe  of  subtrees  expected  by  the  predefined  func¬ 
tion. 

The  set  of  predefined  functions  defines  the  range  of 
comDutat i onal  power  available  to  the  evaluator  and  thus  lim¬ 
its  the  capabilities  available  to  the  user  of  the  GOE.  A 
proposed  set  of  system  predefined  functions,  based  on  the 
primitives  discussed  throughout  IPrat t ,  1 9751  *  is  presented 
in  Aopendix  G.  This  set  of  system  functions  may  be  aug¬ 
mented  by  the  lanquage  implementor  through  additional  or 
superceding  function  definitions  included  as  extensions  in 
the  tanauage  Definition. 
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TV.  PROGRAMS  AS  DATABASES 


A.  INTRODUCTION 

The  material  contained  in  this  chapter  was  originally 
developed  during  the  search  for  a  solution  to  a  particular 
problem:  namely#  that  of  storing  the  tree  representation  of 
the  synthesized  program  in  secondary  storage#  with 
complicated  links  to  other  data  structures  recorded  in  the 
leaves#  in  such  a  way  that  pointer  and  reference  integrity 
could  be  maintained.  This  problem  is  aggravated  by  the 
consideration  that  such  a  stored  structure  might  well  be 
reloaded  at  a  time  when  the  physical  contents  of  shared 
memory  spaces  currently  in  use  by  the  system  are  quite 
different  from  the  environment  existing  at  the  time  that  the 
tree  structure  was  originally  created. 

Once  this  problem  was  recognized  as  being  a  database 
management  problem#  to  which  known  techniques  of  database 
design  were  applicable#  the  solution  was  st raight f orward. 
The  database  design  techniques  described  throughout  this 
chapter  are  taken  from  [Kroenke  IR77J  .  The  relatively 
unorthodox  view  of  orograms  as  complex  databases  afforded  by 
this  insight#  however#  is  of  more  general  interest  since  it 
provides  a  new  perspective  on  the  nature  of  programming 
systems.  In  particular#  these  considerations  provide  some 
justification  for  the  hope  that  grammar-driven  tree 
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synthesizers  are  capable  of  building  up  a  1 anguaae- 
independent  semantic  structure. 

B.  PROGRAMS  AS  COMPLEX  RELATIONSHIPS 

In  viewing  orograms  as  databases#  we  first  recognize 
that  the  semantic  contents  of  a  program  must  be  accessea  by 
two  entities:  the  human  reader  or  writer#  and  tne  processor 
intended  to  execute  the  program.  Comments  excluded#  tne 
information  available  to  these  two  entities  is  almost 
identical:  that  is#  the  human  user  can  predict  exactly  the 
operation  of  the  processor  for  a  qiven  program#  and  the 
processor  deterministically  executes  the  encoded  intentions 
of  the  programmer.  So  without  loss  of  generality#  we  may 
initially  consider  the  program  as  a  database  accessed  by  the 
processor.  In  the  case  of  a  machine  1 anguage  program#  the 
processor  is  the  real  machine  on  which  the  orogram  is  to 
execute.  For  a  higher-level  language#  the  processor  i3  tne 
hardware-sof tware  combination#  or  virtual  machine#  which  is 
capable  of  translating  and  executing  the  program. 

The  "semantic  content"  of  the  program  is  the  collection 
of  Potential  evaluations  which  the  processor  may  be  required 
to  perform  throughout  the  course  of  execution.  For  the 
moment#  we  disregard  the  order  of  execution.  Each 
evaluation  consists  of  the  selection  of  one  of  many 
primitive  operations  which  the  processor  is  capable  of 
performing#  and  the  application  of  that  chosen  primitive 
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operation  to  a  number  of  arguments*  contained  in  one  or  more 
registers*  or  memory  locations  addressable  in  some  way. 

Upon  reflection*  it  is  clear  that  ooth  the  set  of 
primitive  operations  and  the  set  of  addressable  memory 
locations  are  databases  in  their  own  riqht.  The  keyname*  or 
code  by  which  an  entry  can  be  uniquely  located*  for  the  set 
of  primitive  operations  is  the  operation  name*  or  opcode* 
and  that  for  the  collection  of  potential  arguments  is  the 
address . 

Clearly*  the  set  of  potential  evaluations  is*  in  the 
termi nol ogy  of  database  theory*  a  complex  relationship 
between  primitive  operations  and  reqisters.  A  given 
operation  may  be  applied  to  many  different  sets  of  arguments 
within  the  course  of  a  program  execution*  and  a  given 
register  may  be  the  argument  for  a  numoer  of  different 
operations.  There  is  no  functional  relationship  between 
items  of  the  two  databases  in  either  direction*  which  means 
that  neither  keyname  can  be  used  to  uniquely  identify  an 
item  in  the  complex  relationship  between  them. 

C.  DECOMPOSITION  OF  THE  EVALUATION  RELATION 

Standard  database  design  techniques  specify  several  ways 
by  which  each  of  the  elements  of  a  complex  relationship 
between  two  databases  can  be  referred  to  in  a  systematic  and 
unambiguous  way  during  database  access.  Two  general  methods 
of  approach  are  used.  One  is  to  ( arb i t rar i 1 y )  force  the 
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relationship  to  be  simple 
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only),  by  rejecting  from  the  allowed  range  of  oos s i 0 i 1 i t i es 
any  memoers  of  the  relationship  which  would  cause  the 
relationship  to  be  complex.  In  this  case,  the  Keyname  for 
one  of  the  underlying  databases  can  oe  used  to  unambiguously 
refer  to  members  of  the  relationship  as  well.  The  second 
method  is  to  decompose  the  relationship  into  two  simple 
re  1  at i onsh i ps  by  constructing  an  intersection  database. 

There  exist  programming  systems  in  which  the  first 
strategy  is  adopted.  For  instance,  if  the  restriction  is 
made  that  registers  may  not  be  re-used,  so  that  at  most  one, 
and  only  one,  primitive  operation  is  applied  to  a  given 
register,  a  purely  functional,  or  no-assignment  programming 
system  is  obtained.  In  such  a  system,  the  only  named 
semantic  elements  are  functions  and  constants  (which  may  be 
regarded  as  functions).  Registers  need  not  be  named  since 
whenever  one  is  needed,  it  can  be  drawn  from  a  pool,  used 
once,  and  discarded  by  the  processor. 

This  approach  is  considered  mathematically  elegant,  but 
it  is  not  much  in  use  in  non-academic  programming  systems. 

In  the  second  approach,  an  intersection  dataoase  is 
created,  consisting  of  one  entry  for  each  distinct  memoer  of 
the  complex  relationship.  As  a  minimum,  in  order  to  allow 
reference  to  the  generating  databases,  each  entry  in  the 
intersection  database  must  contain  the  keynames  for  those 
entries  in  the  original  data  sets  with  which  it  is 

106 


associated.  Thus*  for  a  programming  notation*  each  entry  in 
the  intersection  database  must  contain,  at  a  minimum,  an 
ODCode  and  a  register  address  for  each  argument*  in  some 
form. 

The  archetyoical  entry  for  the  intersection  oataoase 
corresponding  to  the  evaluation  relationship  is  thus: 

OPCODE  ADDRESS (  1  )  ADDRESS (  i  )  .  .  .  ADDRESS  (  N  ) 

This  format  is  recogn izable  as  the  atomic  unit  of  notation 
for  most  common  orogramming  systems*  from  machine  code  to 
high  level  languages.  Each  single  such  entry  corresponds  to 
what  is  normally  referred  to  as  an  instruction.  In  summary, 
we  assert  that  a  orogram  is  nothing  more  than  the 
intersection  database  for  instances  of  the  evaluation  of 
accessable  operands  by  the  primitive  operations  available  to 
the  evaluating  processor. 

D.  CONTROL  STRUCTURE 

ae  have  heretofore  ignored  the  ouestion  of  how  the  order 
of  execution  of  the  evaluations  is  to  be  SDecified  within 
the  program  (the  basic  elements  of  which  are  now  seen  to  De 
entries  in  an  intersection  database).  This  order 
corresponds  to  the  logical  access  sequence  of  the  set  of 
instructions.  Thus*  we  may  equate  the  ordinary  notion  of 
the  control  structure  of  a  Drogram*  to  the  database-oriented 
notion  of  a  logical  access  structure  for  the  program 
database.  The  simplest  access  mechanism  for  a  oataoase  is 
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to  order  it  as  a  simple  sequence.  Under  this  protocol*  the 
elements  of  the  database  will  be  presented  to  the  accessinq 
entity  in  a  strictly  invariant  sequence. 

Such  an  accessing  structure  is  realized  in  such  simple 
prooramming  systems  as  that  of  a  keyst rcke-proqrammab 1 e 
calculator.  A  sequence  of  keystrokes  can  oe  entered  and 
automatically  reproduced  at  will*  but  there  is  no 
possibility  of  automated  branchinq. 

Such  orogramminq  systems  are  f undament a  1 1 y  limited  in 
mathematical  computational  power.  The  simplest  modification 
to  such  an  access  regime  is  to  allow  conditional  oranching, 
so  that  a  part  of  the  instruction  sequence  may  be  repeated 
or  skipped,  based  on  the  contents  of  a  register  at  the  time 
the  branch  is  reached. 

Machine  and  assembly-level  programming  systems*  as  well 
as  such  high-level  languages  as  BASIC  and  FORTRAN,  are 
organized  on  such  a  plan. 

E,  STRUCTURED  PROGRAMMING  SYSTEMS 

The  disadvantage  of  a  sequential  access  mechanism  is 
that  the  resultinq  database  does  not  have  local  integrity. 
Instruction  sequences  which  may  be  logically  adjacent  under 
certain  circumstances  are  not  necessarily  physically 
adjacent.  This  access  organization  presents  no  real 
di sadvant ages  for  the  machine  processor  with  a  random-access 
architecture*  but  can  be  quite  confusing  for  the  human 
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programmer.  To  render  the  program  aataoase  more  accessible 
to  the  user#  the  notion  of  Structured  programming  was 
developed.  This  orqani zat i onal  technique  consists  of 
organizing  the  access  of  a  orogram  database  in  a 
hierarchical  (tree-like)  manner#  so  that  orogram  control 
follows  a  hierarchical  program  structure  which  can  be 
expressed  as  a  string  generated  by  a  context-free  grammar 
(and  thus  has  an  associated  ohysically  hierarchical 
structure  induced  by  the  grammar).  Such  program  control 
facilities  as  functions  and  subroutines  were  the  earliest 
•structured  constructs".  The  syntax  of  such  languages  as 
PASCAL  and  ALGOL#  however#  were  consciously  designed  to 
facilitate  the  exoression  of  a  hierarchical  control 
structure#  and  make  the  expression  of  a  disordered# 
sequential  control  structure  less  attractive  than  the  use  of 
"structured"  control  operators.  It  is  this  historical 
development  which  encourages  us  to  hope  that  a  l anguaqe- 
independent  semantic  tree  structure  may  be  built  using  a 
grammar-driven  tree  editor.  Basically#  we  note  that  it  has 
become  a  conscious  desiqn  principle  in  the  development  of 
structured  programming  languages#  to  ensure  that  program 
control  flow  follows  the  syntactic  organization  of  the 
language.  The  underlying  set  of  primitive  operators  have  a 
great  deal  in  common.  Language-dependent  primitives  can  be 
added  to  the  set  available  to  the  processor  and  evaluated 
without  regard  to  the  specific  syntax  by  which  they  are 


expressed/  provided  that  the  overall  control  structure  of 


such  additional  primitives  is  also  h i erareh i ca 1 1 y  organized. 

F.  PHYSICAL  REPRESENTATION  OF  A  TREE-STRUCTURED  PROGRAM 

he  are  left  with  the  problem  of  physically  representing 
a  t ree-st rue t ured  program  in  a  sequentially  organized 
physical  memory  space.  The  problems  encountered  are 
precisely  those  encountered  when  attempting  to  implement  any 
h i erareh i ca 1  I y  organized  intersection  set.  They  stem  from 
the  requirement  to  refer#  directly  or  indirectlv#  to  the 
entries  in  the  parent  databases  from  more  than  one  place  in 
the  intersection  database.  Two  general  strategies#  each 
with  its  own  advantages  and  disadvantages#  are  currently  in 
use  in  database  management  systems. 


I .  Sequential  Tree  Representation 


This  strategy 

i  s 

i mpl emented 

by 

representing  the 

t  ree  as  a  1 i near 

list 

of  nodes 

ana 

tnei r  contents  i  n 

preorder  seauence.  References  to  the  parent  databases  are 
embedded  in  the  listing  by  keyname.  The  complexity  of  the 
relationship  implies  that  each  such  keyname  must  be  repeated 
many  times  throughout  the  list.  Special  delimiters  are  used 
between  node  listings  to  indicate  whether  the  next  node  is  a 
child#  sibling#  or  uncle  of  the  last.  If  one  of  the 
keynames  is  to  be  changed#  a  search  of  the  listinq  must  be 
made  to  find  all  of  its  occurrences.  A  second  major 
disadvantage  is  that  in  order  to  access  any  part  of  the 
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list#  the  list  must  be  traversed  sequentially  from  tne 
beginning.  On  the  other  hand.  no  pointers  need  occur 
anywhere  in  the  list,  so  that  it  can  be  moved  about  freely 
from  one  place  to  another  without  change. 

2 .  Linked  Representation  of  Trees 

Trees  are  represented  in  this  strategy  by  nodes 
linked  together  using  pointer  fields  within  each  node.  A 
pointer  is  either  the  absolute  address  of  the  entity  Pointed 
to.  or  an  offset  or  array  subscript  which  can  be  used  by 
routines  in  the  system  to  calculate  such  an  address.  Tne 
salient  feature  of  a  pointer  reference  is  that  it  allows 
reference  by  some  mechanism  which  is  independent  of  the 
value  of  the  referenced  entity.  Thus,  the  value  of  the 
entity  itself  can  be  chanqed  without  changing  all  of  the 
references  to  it.  which  are  still  valid  (provided,  of 
course,  that  the  change  is  made  without  physically  moving 
the  chanqed  record.)  when  the  tree  itself  is  represented  by 
means  of  nodes  linked  with  pointers,  it  is  common  to  link 
the  leaves  of  the  tree  to  the  parent  databases  with  pointers 
as  well.  It  is  assumed  that  a  means  exists  to  distinguish 
such  external  links  from  the  internal  links  defining  tne 
tree  structure  itself.  This  representation  has  as  one  major 
advantage  the  ability  to  be  quickly  traversed  (by  following 
pointers).  Another  major  advantage  of  this  strategy  is  that 


Deletion  of  information  is  somewhat  more  difficult#  but  can 
be  accomplished  ov  constructing  and  maintaining  cross- 
reference  lists  (inverted  lists)  which  contain  pointer 
references  to  all  nodes  in  the  tree  referring  to  a  given 
record  in  the  Parent  database.  The  primary  disadvantage  of 
such  a  representation  is  that  the  structure  cannot  be  moved 
or  stored  without  a  great  deal  of  pointer  modification.  The 
use  of  relative  pointers  is  an  inadequate  solution#  since 
the  consistency  of  references  to  the  parent  databases#  which 
need  to  be  moved  and  managed  as  separate  entities#  must 
still  be  maintained. 

3.  A  Hybrid  Strategy  for  Tree  Pepresentation 

An  examination  of  these  characteristics  indicates 
that  the  linked  representation  is  preferable  when  changes 
are  to  be  made  to  either  the  parent  or  tree  databases#  but 
that  the  sequential  representation  is  preferable  when  the 
database  is  to  be  transmitted  from  one  location  to  another# 
or  stored  unchanged  for  a  relatively  long  period  of  time. 
(Storage  is  equivalent  to  transmission  from  one  time  to 
another#  and  is  thus  loqically  the  same  problem  as  that  of 
movement . ) 

we  conclude  that  the  linked  represent  at i on  is  an 
appropriate  representation  for  the  program  tree  during 
synthesis  and  evaluation#  but  that  the  program  tree  should 
be  moved  (or  stored  on  secondary  storage)  in  sequential# 
pointer-free  format.  Links  to  the  parent  databases  are 


converted  from  pointer  references  to  reference  by  keyname. 
The  next  section  addresses  the  problem  of  how  conversion 
between  the  two  representations  can,  in  general  terms*  be 
accompl i shed. 

U.  PROCEDURAL  REPRESENTATION  UF  DATA 

In  order  to  incorporate  these  ideas  into  a  feasible 
design*  we  consider  the  facilities  that  would  have  to  exist 
in  such  a  system.  Since  the  program  tree  is  to  be  operated 
on  in  main  memory  with  a  linked  representation*  we  may 
assume  that  a  data  manipulation  package  exists  which  is 
capable  of  synthesizing  and  maintaining  all  of  the  pointers 
required  to  keep  the  linked  structures  coherent  and 
consistent.  Consider  the  process  of  removing  a  sequentially 
organized  tree  structure  from  secondary  storage  and  loading 
it  into  internal  memory.  This  process  must  consist  of 
ordering  a  particular  series  of  function  activations  with 
particular  arguments  from  the  data  manipulation  packaqe* 
causing  the  desired  structure  to  be  built  within  physical 
memory.  The  sequential  representat i on  is  seen  to  be  nothing 
but  a  program  for  the  data  manipulation  package*  which  is 
itself  a  processor  with  a  number  of  primitive  operations. 

Moreover*  a  strictly  sequential  control  protocol  for 
this  program  is  possible*  given  a  reasonably  powerful  set  of 
primitives  in  the  data  manipulation  packaqe*  since  a  tree 
can  be  synthesized  in  strict  pre-order  sequence  (the  parent 


for  each  child  exists  at  the  time  of  the  child's  synthesis.) 

*e  conclude  that  the  appropriate  secondary 
represent  at i on  for  a  program  tree  is  as  a  sequential  list  of 
instructions#  to  be  translated  by  some  simple  interpreter 
into  a  series  of  calls  to  the  data  manipulation  package. 

The  offload#  or  transmit  process#  consists  of  a  pre¬ 
order  traversal  of  the  linked  represent  at i on #  emitting  the 
appropriate  instructions  for  recreating  the  skeleton  of  the 
tree  and  filling  in  the  contents  of  each  node  as  it  is 
reached.  At  the  same  time#  references  can  be  removed  from 
the  appropriate  cross-ref erence  lists#  triggering  removal  of 
the  data  item  from  the  parent  database  when  a  reference 
count  of  tero  is  reached.  Durinq  onload#  the  skeletal 
structure  of  the  tree  is  recreated#  and  external  references 
in  symbolic  form  reloaded  into  the  appropriate  parent 
database.  Pointer  and  cross-reference  list  creation  and 
maintenance  is  performed  automatically  by  the  pre-existinq 
data  manipulation  package. 

The  secondary  representat i on  can  thus  be  viewed  either 
as  data#  representing  the  tree  in  linear  format#  or  as  a 
program  for  the  data  structure  manipulation  package  which 
will  cause  a  logically  equivalent  tree  to  be  reconstructed 
in  available  memory. 

As  a  beneficial  side  effect#  if  the  capability  is 
installed  to  allow  the  onload  and  offload  translators  to 
read  to  or  from  strings  in  main  memory#  the  described  system 


provides  an  easy  way  to  copy  or  move  subtrees*  as  well  as  to 
encode  t ree-bu i 1 di nq  templates  efficiently.  In  fact*  the 
proposed  mechanism  becomes  the  method  of  choice  for  any  and 
all  movement  of  tree  structures  from  one  location  or  time  to 
another*  since  the  data  in  the  transmitted  stream  is 
entirely  logical*  containing  no  reference  to  any 
i mp l emen t a t i on  details.  The  process  would  even  allow 
internal  representations  to  be  transmitted  from  one 
installation  to  another  with  a  completely  different 
implementation*  since  all  i mp 1 ement at i on-dependent  data  is 
removed  during  the  offload  process  and  reinserted  during  the 
onload  process. 

H.  SUMMARY 

In  this  section  we  have  viewed  programs  as  specialized 
databases*  and  have  found  that  standard  database  models 
correspond  nicely  to  various  programming  language  styles. 
Two  fundamental  conclusions  have  been  reached.  The  first  is 
that  it  seems  very  likely  that  grammar-driven  tree  editors 
can  be  used  to  produce  trees  representing  the  control 
structures  for  common  programming  languages  in  a  syntax- 
independent  *  di  rect  1  y-eval  uab  e  format.  This  hope  is  based 
on  the  direct  expression  of  hierarchical  control  structures 
by  the  syntactic  hierarchy  implicit  in  the  defining  grammars 
of  current  programming  languages*  and  the  recognition  that  a 
small  set  of  such  control  structures  provides  the  common 


5 


case  for  current  language  design. 

The  second  result  is  the  solution  to  a  technical 
problem:  that  the  appropriate  format  for  such  program  trees 
is  in  linked  form  when  the  tree  is  undergoing  modification, 
and  as  a  sequential,  orocedural,  pointer-free  list  of 
instructions  when  the  tree  is  being  stored,  or  transmitted 
from  one  point  to  another. 
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V.  A  PROTOTYPE  SYSTEM  DESIGN 


In  this  section/  the  design  for  a  prototyoe  system 
demonstrating  the  feasibility  of  the  ideas  developed  in 
previous  chapters  is  described.  Since  the  implementation  of 
the  described  system  is»  at  ©resent/  incomplete/  the  design 
is  presented  only  in  broad  outline.  A  full  description  of 
the  demonst rat i on  prototype  will  be  provided  as  a  Technical 
Report  when  the  initial  implementation  is  complete. 

The  approach  taken  is  to  first  describe  a  complete  system 
for  a  grammar-dr i ven /  language  independent  programming 
environment#  and  then  select  a  subsystem  for  implementation 
as  a  prototype  feasibility  study.  The  prototype  subsystem 
will  be  used  to  generate  statistics  concerning  memory  size 
and  computational  efficiency/  as  well  as  to  refine  the  user 
interface/  with  the  possibility  remaining  of  extending  tne 
prototype  to  a  more  complete  implementation  at  a  future 
time. 

A  basic  block  diagram  of  the  complete  system  is  provided  as 
Figure  5. 

A.  SYSTEM  MOOULES. 


The  proposed  system  consists  of  the  following  modules 


1 .  Data  Structure  Support  Module* 

This  module  contains  packages  of  functions,  each 
package  implementing  a  specific  abstract  data  type  needed  oy 
the  rema i nder  of  the  system.  At  a  mini  mum,  the  abstract 
data  tyoe  packages  needed  include  one  supporting  an 
indefinite  number  of  indefinitely  large  association  lists, 
(to  represent  the  contents  of  tree  nodes),  and  one 
supporting  general  ordered  trees,  optimized  toward 
reasonably  efficient  traversal  in  all  directions.  In 
addition,  the  tree  support  package  must  include  a  facility 
for  linking  the  leaves  of  trees  to  other  data  items,  such  as 
strings,  symbol  table  entries,  numerical  contents,  and  so 
on.  Each  tree  node  (internal  as  well  as  leaf)  must  d# 
linkable  to  an  association  list  representing  the  contents  of 
the  node. 

In  addition  to  supporting  tree  and  association  list 
data  types,  this  module  is  responsible  for  supporting  anv 
additional  data  types  for  which  the  need  arises  and  which 
are  not  supported  directly  by  the  language  used  for 
i mp 1 ement a t i on .  (In  particular,  the  implementation 
currently  being  developed  reguires  a  very  primitive  string 
table  which  serves  as  a  rudimentary  symbol  table.) 


2.  Grammar-Dr i yen  Environment  Module. 

This  module  provides  an  editor-like  interface  for 


to  evaluate  a  particular  program  structure*  and  movement  of 
Abstract  Syntax  Trees  from  secondary  to  primary  storage  and 
oack  again.  A  major  component  of  this  module  is  the 
grammar-driven  synthesizer  itself. 

3 .  Memory  Management  Module. 

This  module  comprises  the  actual  system  orimary 
memory  itself*  which  is  used  to  store  the  LD  (Language 
Description)  and  AST  (Abstract  Syntax  Tree)  currently  in 
use.  In  addition*  the  primary  memory  module  contains  the 
data  structures  being  manipulated  by  the  Data  Structure 
Support  Module. 

a .  File  Management  Module. 

This  module  implements  a  single-user  workspace  on 
secondary  storage  which  contains  all  of  the  LD's  available 
to  the  user*  as  well  as  all  of  the  AST's  which  may  have  been 
previously  created  and  saved.  These  components  are  stored 
in  seguential*  Dointer-free  format  as  discussed  in  Chapter 
IV. 

5.  Input/Qutout  Manifolds. 

These  modules  manage  the  system  input  and  output 
streams*  which  may  be  redirected  as  required  by  components 
of  the  system  (including  the  user)  to  various  physical 
devices.  The  input  stream  may  be  taken  from  the  keyboard*  a 
file  on  secondary  storage*  or  a  string  in  primary  storage. 
This  assignment  may  be  changed  dynamically  during  the 
operation  of  the  system.  Similarly*  the  output  stream  may 
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be  dynamically  directed  to  the  CRT/  a  string  in  primary 
storage/  or  to  a  file  on  secondary  storage.  (Tne  term 
"manifold"  is  used  to  suggest  that  these  functions  may  oe 
thought  of  as  t h ree*oos i t i on  switches/  the  setting  of  which 
may  be  changed  at  will  during  system  operation.) 

6.  Onload  and  Offload  Translators. 

These  modules/  controlling  the  Data  Structure 
Supoort  facilities/  convert  the  seauentia)  data 
represent  at i ons  stored  on  secondary  storage  to  the  linked 
representat ion  needed  when  an  LD  or  AST  is  loaded  into 
primary  memory/  and  vice  versa.  As  a  secondary  feature/ 
since  the  input  and  output  streams  may  originate  or  oe 
directed  to  internal  strings/  these  modules  can  oe  used  to 
"quote"  or  "unquote"  tree  structures/  as  when  a  template  is 
translated  into  an  actual  subtree  replacement. 

B.  PRE-EXISTING  MODULES. 

The  current  i mpl ementat i on  is  being  made  usinq  the  C 
Programming  Language  on  a  PDP-11  with  the  UNIX  Operating 
System.  (UNIX  is  a  trademark  held  by  Bell  Laboratories/ 
Inc.)  This  software  combination  provides  a  C-accessible 
interfac'.  to  memory  and  file  management  facilities.  In 
addition?  a  complete  library  of  string  handling  and 
input/outout  functions  is  available.  In  consequence/  the 
memory  and  file  management  modules  descrioed  above  may  oe 
thought  of  as  already  in  existence/  for  the  ourpose  of 
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describing  the  orototvpe  subsystem.  In  addition,  keyboard 
and  CRT  interfaces  are  already  operational:  under  the  UNIX 
operating  system,  hardware  interfaces  are  mapped  into  the 
system  as  files  with  conversion  routines  provided 
transparently.  Thus,  for  the  Input/Qutput  Manifold  module 
we  need  only  provide  a  means  of  diverting  the  input  and 
output  streams  from  one  file  to  another,  or  to  main  memory. 

C.  SUBSYSTEM  SELECTION. 

Given  the  broad  outline  of  system  module  function 
provided  above,  a  minimally  capable  prototype  subsystem  can 
be  selected  for  initial  implementation.  Such  a  subsystem 
must  be  capable  of  i n i t i a  1 i za t i on,  synthesis,  display  and 
storage  of  an  AST  in  order  to  demonstrate  convincingly  the 
feasibility  of  the  concepts  outlined  in  previous  chapters. 
Facilities  to  evaluate  (execute),  revise,  and  debug 
previously  entered  AST's  may  be  deferred,  as  may  the 
facility  to  easily  install  a  new  Language  Definition. 
Therefore,  the  capabilities  provided  by  each  of  the  modules 
in  the  prototype  subsystem  may  be  redefined  as  follows: 

1 .  Data  Structure  Support  Module. 

Full  packages  supporting  general  ordered  trees  and 
association  lists  are  needed.  In  addition,  a  orimitive 
capability  tb  store  and  reference  string  values  is  needed. 
The  capability  to  suoport  sophisticated  symbol  table 
structures  may  be  deferred  to  such  time  as  semantic 
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information  is  needed  to  allow  execution  of  AST  structures. 

2 .  Grammar-Or i yen  Environment  Module. 

The  only  major  caoability  required  by  the  prototype 
subsystem  is  the  "append"  function,  which  can  De  used  to 
create  AST  structures.  In  addition,  a  working  display 
mechanism  with  simple  cursor  control  facilities  is  needed. 
A  frame-oriented  display  mode  is  satisfactory  for  the 
prototype  system  (although  eventually  a  screen-oriented 
display  driver  would  be  desirable).  Finally,  facilities  for 
storing  and  retrieving  AST’s  to  and  from  secondary  storage 
as  well  as  a  facility  (however  cumbersome)  for  installing 
new  1 anguage  definitions  is  needed. 

3 .  Input/Qutout  Manifolds. 

These  modules  need  to  be  implemented  in  full,  in 
order  that  secondary  storage  may  be  used,  and  in  order  to 
allow  templates  existing  in  primary  memory  to  appear  in  the 
input  stream  for  processing  bv  the  Onload  translator. 

4 .  Onload  and  Offload  Translators. 

These  components  also  must  be  fully  implemented  for 
the  same  reason  as  the  Input/Qutout  Manifolds.  The 
implementation  must  be  flexible  enough  so  that  as  more 
sophisticated  data  structure  packages  are  added,  the 
sequential  representation  can  syntax  can  be  extended  to 
accomodate  onload  and  offload  of  keyfields  in  the  new 
st  ructures. 
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5 .  Bootstrap  Procedure 


The  system  can  be  initialized  as  follows.  ae 
currently  regard  Language  Definitions  as  oeing  written  in 
one  of  tnree  languages*  or  notafional  systems:  a  high-level 
format  (which  is  to  consist  of  R-ARGQT  notation  with  display 
and  semantic  specification  extensions)*  i nt e rmeoi at e- l eve  1  * 
(the  notation  developed  in  Chapter  III)*  and  low-level*  (the 
sequent i al i zed*  pointer-free  representation  of  an  internal 
tree  corresponding  to  the  desired  LD*  using  the  language 
alluded  to  in  Chapter  IV.). 

There  is  no  fundamental  difference  between  the 
intermediate  and  low-level  formats*  since  they  represent  two 
alternative  representat i ons  for  the  same  database. 
Translation  from  one  format  to  the  other  is  performed 
automatically  by  the  onload  and  offload  translators  when 
this  database  is  moved  to  and  from  secondary  storage. 

In  order  to  bootstrap  the  system*  once  all  of  the 
modules  have  been  compiled  and  linked*  it  is  necessary  only 
to  perform  the  job  of  manually  translating  an  intermediate- 
level  description  of  the  intermediate-level  language  to  the 
corresponding  low-level  description*  and  install  the 
resulting  text  as  a  file  accesible  to  the  system  usinq  a 
conventional  editor. 

At  this  point*  the  system  facilities  can  be  actuated 
to  load  the  file  as  a  language  description  into  system 
primary  memory.  During  the  load*  the  onload  translator  will 
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convert  the  description  into  a  linked  representation  of  the 


database  needed  to  describe  and  guide  the  synthesis 
language  descriptions  in  the  intermediate  format, 
the  system  itself  can  not*  be  used  to  create/  as  a 
driven  editor/  additional  language  descriptions. 


o  f  new 
That  i s / 
grammar- 


VI.  SUMMARY  . 


A.  CONCLUSIONS. 

In  the  preceoina  chapters,  a  conceptual  foundation  for 
the  interactive  creation  of  databases,  structured 
h i erarch i ca 1 1 y  accordinq  to  a  given  conte*t*f ree  grammar, 
has  been  provided.  The  primary  conclusions  supported  oy 
this  work  are: 

1.  A  basic  model  for  the  described  process  is  that  of  a 
valid  sentential  form  generator,  rendered  determinate  Dy 
allowinq  for  the  interactive  selection  of  which  production 
to  apply  and  at  which  point  in  the  a l readv-der i ved  structure 
the  selected  substitution  is  to  be  made. 

2.  Notations  exist  (e.g.  the  R-ARGOT  notation)  for  the 
specification  of  general,  context-free  grammars  which  are 
both  human-oriented  and  directly  interpretable  as  the 
knowledge  base  for  such  a  system. 

3.  The  basic  mechanism  correctly  interprets  ambiguous 
or  incomplete  grammars,  as  well  as  allowing  for  the 
synthesis  of  correctly  labeled  incomplete  derivations. 

4.  Analogous  mechanisms  can  be  described  which  derive 
and  display  not  strings,  but  derivation  trees  which  are 
morphisms  of  validly  derived  strinqs  under  the  specified 
grammar. 
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5.  The  grammatical  notation  can  be  transformed  into 
context -i ndependent  ooeration  codes  with  arguments  wnich  can 
be  stored  in  the  leaf  nodes  of  the  derived  tree  in  such  a 
way  that  subsequent  synthesis  proceeds  correctly*  and 
subtree  deletion  can  be  efficiently  and  consistently 
performed  without  examination  of  the  surrounding  context  in 
the  tree. 

6.  The  resulting  derivation  trees  can  be  used  to  encode 
semantic  information  in  such  a  way  that  the  trees  can  be 
evaluated  correctly  without  further  reference  to  the 
syntactic*  as  opposed  to  physical*  structure  of  the  tree. 
(This  assertion  is  a  specul at i on*  not  a  firm  conclusion.) 

7.  A  method  exists  for  storing  such  structures  in  such 
a  way  that  their  consistency  does  not  depend  on  any  external 
data  structures  save  the  language  definition  itself. 

B.  WORK  IN  PROGRESS 

Implementation  of  the  prototype  subsystem  is  currently 
in  progress*  with  no  difficulties  currently  foreseen.  The 
only  module  awaiting  final  coding  and  test  is  the  Grammar- 
Driven  Environment  module  itself*  and  the  algorithmic 
soec i f i cat i on  of  the  functions  needed  has  already  been 
accomplished.  Provided  that  no  further  difficulties  are 
encountered*  a  complete  description  of  the  prototype 
subsystem  will  be  later  provided  as  a  Naval  Postgraduate 
School  Technical  Report. 
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The  prototype  suDsystem  code  is  oriented  toward  a 
demonstration  of  technical  feasibility  as  opposed  to  storage 
or  execution  time  efficiency.  However,  it  has  been  written 
in  a  highly-modularized  manner,  so  that  after 
instrumentation  and  performance  measurements  appropriate 
modifications  can  be  made  fairly  easily.  An  attempt  has 
been  made  to  provide  for  the  extension  of  the  orototvpe 
system  to  a  more  complete  realization  of  the  original  system 
design. 

C.  FUTURE  RESEARCH  DIRECTIONS. 

After  completion  of  the  prototype  subsystem,  two 
directions  are  indicated  for  future  i nvest i gat i on. 

1 .  Extension  of  the  Prototype  Subsystem. 

a.  Symbol  Table  Implementation. 

A  generalized  Symbol  table  data  type  must  be 
defined  which  will  adequately  support  a  wide  range  of 
programming  languages. 

b.  Semantic  Action  Implementation. 

A  class  of  primitive  operations  (including 
access  facilities  to  the  defined  symbol  table  structure) 
must  be  formulated,  provision  made  for  1 anquage-i mol ementer 
definition  of  additional  primitives,  and  an  AST  interpreter 
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c.  Pattern-Matching 


A  pattern-matching  facility  should  oe  provided 
as  part  of  the  user  interface  as  a  sophisticated  means  of 
cursor  control.  A  fairly  simple  pat t ern-mat ch i ng 
capab i  1  i  t  y  #  when  combined  with  the  ore-existing  capability 
to  access  the  AST  in  a  syntax-oriented  way#  would  allow  the 
user  to  search  and  access  the  structure  in  very 
sophisticated  ways*  e.g.  such  commands  as  "find  the  next 
occurrence  of  an  assignment  to  identifier  a"  could  easily  be 
formulated.  Moreover#  when  combined  with  a  relatively 
st raight f orward  debug  facility#  (for  example#  setting  of 
break-points)  a  very  high-level  program  test  facility  could 
be  provided. 

d.  High  Level  Language  Descriptions. 

The  high-level  format  for  both  syntactic  and 
semantic  language  specification  should  be  formulated  and 
implemented  as  a  more  convenient  means  for  implementing  new 
1 anquages . 

e.  Debugging  Tools. 

Provisions  should  be  made  to  allow  the  user  to 
set  breakpoints#  access  the  current  data  environment#  and 
order  steo-by-step  execution  modes  from  the  editor. 

f.  Dynamic  Language  Changes. 

The  feasibility  of  allowing  language  changes  to 
be  made  dynamically  during  AST  creation  or  execution  at 
points  specifiable  in  the  language  definition  should  oe 
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investigated.  Related  to  this  problem  is  the  provision  of  a 
facility  to  link  (perhaps  dynamically)  one  AST  to  another. 

g.  Increased  Storage  Efficiency. 

Once  basic  design  parameters.  now  indefinite. 
(Such  as  number  of  orimitive  operations)  are  made  final,  the 
desirability  of  packing  data  fields  into  AST  nodes  rather 
than  using  the  spac e- i ne f 1 i c i en t  association  list 
implementation,  and  the  resulting  impact  on  time-efficiency, 
should  be  studied. 

h.  Full  User  Interface. 

Deferred  edit  functions,  such  as  delete  and 
insert,  should  be  installed  in  the  Grammar-Dr i ven 
Environment  Module. 

2.  Additional  Applications  for  the  Technology. 

The  conceptual  framework  orovioed  by  this  oa per  is 
sufficiently  general  to  support  unexpected  applications  in 
areas  quite  distant  from  the  field  of  programming 
environment  design.  A  few  such  applications  are  suggested 
below: 

a.  Generalized  Editing. 

Generalized  editors,  as  described  in  [Fraser 
1980],  are  editors  which  provide  for  the  manipulation  and 
display  of  data  structures  other  than  text  files.  The 
mechanism  is  well-suited  for  the  direct  editing  of  a 
hierarchical ly  organized  database  of  any  type. 
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b.  Sparse  Programmina  '.anguaqes 


Current  programming  1 anquages  are  designed  with 
a  oarser-based  implementations  as  a  fundamental  assumption. 
For  that  reason/  they  typically  include  many  keyword  and 
punctuation  symbols  which  are  irritating,  because 
superfluous,  to  human  users.  Because  the  described 
technology  can  utilize  ambiguous  grammars,  soarse  languages 
with  the  minimum  amount  of  punctuation  needed  for  human 
comprehensibility  can  be  described  which  could  be 
implemented  using  grammar-dr i ven  synthesis  as  the 
fundamental  input  mechanism.  In  fact,  improved  performance 
from  the  synthesizer  could  be  expected  for  such  a  "pseudo* 
code"-like  language,  3ince  the  inherent  semantic  density  of 
the  derivation  tree  could  be  made  very  high. 

c.  Artificial  Intelligence  Applications. 

In  the  described  desiqn,  considerable  pains  have 
been  taken  to  provide  a  simole,  uniform  method  for  grammar 
rule  and  point  of  application  selection,  suitable  for  use  bv 
a  human  operator.  There  is  no  fundamental  reason  why  very 
complicated  heuristic  methods  could  not  be  used,  however,  to 
select  the  rule  to  be  applied  and  the  place  in  the  current 
structure  the  application  is  to  be  made.  For  instance,  a 
production  system  (in  the  Artificial  Intelligence  sense) 
could  be  used  to  perform  this  function.  The  resulting 
hybrid  system  would  have  a  heuristic  front  end,  and  an 
algorithmic  back  end,  with  the  desirable  property  that 


whatever  structure  the  heuristic  front  end  attempted  to 
build#  the  resulting  structure  would  alwavs  oe  guaranteed  to 
be  correct  in  terms  of  the  "deeo  structure"  specified  by  the 
language  description.  Attempts  bv  the  heuristic  module  to 
perform  inconsistent  modifications  would  be  detected# 
prevented#  and  reported  by  the  synthesis  module.  A 
knowledge  represent  at i on  based  on  such  a  system  would  be 
able  to  interact  with  the  user  in  very  irregular#  and 
occasionally  incorrect#  ways#  while  preserve  a  fundamental 
internal  database  with  guaranteed  consistency. 
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APPENDIX  A.  NU  f  A  T  IONAL  StSTEMS  HUP  CUNfEXT-FPEt  uPAMMAKS 
1.  BAUKUS-NAUP  FUPMAT  lin  P-AKGU  T ) 
cont ex t -f ree-grammar :  t  production  . 

production:  non-terminal  "::="  t  r  i  gh  t -hana-s  i  ae  J  . 

r i gh t -hand-s i de :  ♦  construct  . 

construct:  {  terminal  i  non-terminal  >  . 

non-terminal  .*  *<"  string  “>*  . 

terminal:  "string". 

we  assume  that  "string"  is  a  sequence  of  any  appropriate 
character  set  not  including  the  metasymbols. 

Note  tnat  this  notation  is  in  itself  a  regular  language. 

i.  ARGOT  NOTATION  fin  R-ARGUT) 

ARGOT:  ♦  rule  . 

rule:  rule-name  ":"  concatenation. 

concatenation:  tsuo-express i on  . 

suo-express i on :  {  opt i ona 1 -i t erat i on 

1  si mpl e-i terat i on 
!  list-iteration 
i  option 
i  alternation 
i  oot i ona 1 -a  1 1 e ra t i on 
I  rule-name 
i  terminal 
;  group 
>  . 

oot i ona 1 -i ter at i on :  " * "  sub-expression  . 
si mp I e-i t erat i on :  sub-expressi on  . 

I i st -i terat i on :  sub-expression  sub-expression  "...". 

option:  "C"  concatenation  ") "  . 


alternation:  "{"  concatenation  "J"  alternatives  "}"  . 

opt i ona l -a 1 1 ernat i ve : "  l"concat enat i on  " ! M  a  1 ternat i ves "1  "  . 

al ternat i ves:  9  concatenation  ... 

group:  "l*  concatenation  ")"  . 

terminal:  "  "  "  string  "  "  "  . 

rule-name:  string. 

("string"  is  taken  to  De  a  predefined  rule.) 

3.  R-ARGOT  (in  R-ARGUT) 

K-ARGO  T :  +  rule  . 

rule:  rule-name  expression  . 

expression:  {  concatenation 
J  iteration 
J  1 i st-i terat i on 
J  alternation 
>  . 

concatenation:  tfield  . 

iteration:  "t"  rule-name  . 

1 i st-i terat i on:  "9"  rule-name  field 

alternation:  "{"  rule-name  *!"  alternatives  . 

alternative:  #  rule-name  "J"  ...  . 

field:  (  rule-name 
5  option 
!  terminal 
>  . 

option:  rule-name  "J "  . 

terminal:  "  "  "  string  "  "  "  . 
rule-name:  string  . 

Note  that  this  notation  is?  in  itself*  a  regular 
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APPENDIX  3.  A  GRAMMAR  FOR  PASCAL 
IN  R-ARGOI 

PASCAL:  "program"  identifier  "l"  name-list  ")"  ";" 
o  1  oc  k  "  «  "  . 

oIock:  (  labels  1  (  constants  1  l  types  J  (  variables  ] 
l  subroutines  1  "begin"  statements  "end" 

labels:  "label"  integers  . 

constants:  "constant"  c-decls  "J"  . 

types:  "type"  t-dec!s  "?"  . 

variables:  "var"  v-decls  "#"  . 

subroutines:  t  s-decl  . 

integers:  tinteger  . 

c-decls:  *  c-decl  ... 

c-decl:  identifier  "  =  "  constant  . 

t-decls:  #  t-decl  ... 

t-decl:  identifier  "s"  type  . 

v-decls:  #  v-decl  ";"  ...  . 

v-decl:  name-list  ":"  type  . 

name-list:  #  identifier  ... 

s-aec I :  i  p-dec 1 
!  t-decl 
> . 

p-dec! :  "procedure"  identifier  t  parameters  J  "»" 
bl ock  "  . 

f-decl:  "function"  identifier  toarametersl  identifier";" 

block";". 

parameters:  "("  param-list  "}"  . 
param-list:  »  param-sec t i on  ...  . 
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pa  ram-sec t 1  on :  {  f-params 
i  v-params 
!  p-params 
!  c*params 
)  . 

f-params:  "function"  name-list  identifier  . 

v-params:  "var"  name-list  identifier  . 

p-params:  "procedure"  name-list  . 

c-params:  name-list  identifier  . 

type:  [  scalar-type 

!  subrange-type 
i  pointer-tyoe 
!  set-type 
i  array-type 
1  record-type 
i  f i I  e-type 
J  i dent i f i er 
>  . 

scalar-tyoe:  "("  name-list  " ) "  . 

suorange-t ype :  constant  constant  . 

pointer-type:  "t"  identifier  . 

set-tyoe:  l  packed  1  "set"  "of"  simple-type  . 

array-type:  [packed]  "array"  "l"  subscripts  “1"  "of"  type  . 

record-tyoe:  I  packed  J  "record"  t  field-list  ]  "end"  . 

file-tyoe:  l  packed  J  "file"  "of"  type  . 

packed:  "packed"  . 

simole-tyoe:  <  identifier 
!  seal jr-t  ype 
!  subrange-type 
>  . 

field-list:  (  var-fields 

J  mixed-fields 

>  . 

mixed-fields:  fixed-fields  [  and-var-f i e I ds  J  . 
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and-var-f i e I ds :  var-fields  . 

1 1  »«3*  f  i  e  1  as  :  «*  fixed-field  "  ;  "  ... 

fixea-fiela:  name-list  *:"  type  . 

var-fields:  “case"  t  tag  )  identifier  "of" 

va r i ant  s  . 

variants:  a  variant  ... 

variant:  constant-list  "("  l  field-list  J 

constant -I i st :  #  constant  "»"  ... 

statements:  #  statement  ... 

statement:  (  integer  1  (  action  1  . 

action:  {  assignment 

i  orocedure-ca 1  I 
i  compound 
i  i f-statement 
J  repeat 
1  while 
}  for 

J  case-statement 
I  goto 
j  with 
>  . 

assignment:  variable  "s"  exoreasion  . 

procedure-call:  identifier  l  arguments  1  . 

arguments:  *("  arglist  ")"  . 

arglist:  #  argument  "**  ... 

argument:  {  identifier 
i  expression 
>  . 

compound:  "begin" 

st  atement  s 
"end"  . 

i f-statement :  "if"  expression  "then" 

statement 
[  else-part  1  . 


e l se-oart 


:  "else** 

statement  . 


repeat:  "repeat” 

statements 

"until"  expression  . 


while:  "while"  expression  "do" 
statement  . 

for:  "for"  identifier  ":="  expression  for* 
statement  . 


t -or-d : 


downto 

I 

I 

}  . 


t  o 


downto:  "downto"  . 


to:  "to"  . 


case-statement:  "case"  expression  "of" 

cases 

'end"  . 


cases:  #  case  ";"  ... 

case:  constant-list  statement. 

with:  "with"  variables  "do" 
statement  . 

goto:  "goto"  integer  . 

variables:  *  variable  ...  . 

expression:  {  It 
i  1  te 
I  eg 
{  gte 
J  gt 
{  neg 
I  i  n 

{  s-expression 

>  . 

It:  s-expression  "<"  s-expression. 
Ite:  s-expression  *<*"  s-expression. 
eg:  s-expression  "a"  s-expression. 


express i on 


gte:  s-exoression  1  >  =  *  s-exoression. 

gt :  s-exoression  s-exoression. 

neq:  s-exoression  "<>"  s-exoression, 

in:  s-exoression  "in"  s-express i on . 

s-e xpress i on :  l  sign  1  u-expression. 

sign:  {  plus-sign 
i  minus-sign 
>  . 

plus-siqn:  *♦"  . 

minus-sign:  . 

u-expression:  {  plus 
}  minus 
I  or 
1  term 
>  . 

plus:  term  *♦"  term  . 

minus:  term  term  . 

or:  term  "or"  term  . 

term:  {  t i mes 
;  quot 
i  di  v 
1  moo 
i  and 
i  factor 
>  . 


times 

:  factor 

factor 

quot  : 

factor  "/" 

factor 

ui  v : 

factor  "div" 

factor 

mod : 

factor  "mod" 

factor 

and: 

factor  "and" 

factor 
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facto**:  {  grouo 
!  not 
i  set 
;  w*or»c 
>  . 

group:  "("  expression  ")"  . 

not:  "not*  factor  . 

set:  "l"  l  set-members  1  "1"  . 

set-members:  #  set'nei'ber  ...  . 

set-member:  {  range 

J  expression 

>  . 

range:  expression  expression  . 

v-or-c:  {  unsigned-constant 
i  var i ao 1 e 
>  . 

variable:  identifier  (  modifiers  )  . 

modifiers:  ♦  modifier  . 

modifier:  {  subscript 

}  field-reference 
1  indirection 
>  . 

subscript:  "1“  expressions  "J "  , 
f iel d-ref erence:  "."  identifier  . 
indirection:  "t*  . 

expressions:  #  expression  "  ,  *  ... 

It  is  assumed  that  predefined  input  scanners  exist  for 
the  rule  names  "integer",  "identifier",  "constant",  and 


unsigned-constant " 


rr 

APPENDIX  c:  TRANSFORMATION  TEMPLATE  GRAMMAR 

Th«  following  grammar  defines  svmbol  strings  which  are 
interpreted  as  calls  to  tree-building  and  node-modi f y i nq 
routines  whose  existence  is  assumed*  as  is  the  interpreter 
which  maxes  those  calls.  Also  implicit  in  the  following  de¬ 
finitions  and  discussion  is  the  notion  of  a  "current  node"* 
defined  for  the  purpose  of  the  application  of  templates  to 
be  anv  free  node  in  an  AST. 


tempi  ate: 

(  subtree  S  siblist  >  . 

subt  ree: 

boundnode  Ichildlistl  . 

c  h  i  1  d  ?  (  s  t : 

"("  siblist  . 

sibl i st : 

#  f reenode  "*"  ...  . 

boundnode: 

boundop  rulefield  . 

f reenode: 

freeop  rulefield  . 

rul ef i e 1 d: 

"*"  rulename  . 

boundoo: 

{  HEAO  S  ITER  {  LIST  !  pdf 

>  . 

odf : 

<  (predefined  functions)  > 

• 

f reeop: 

(  NT  !  ALT  }  COPT  !  IOPT  | 

LOPT  !  TERM  ) 

rul ename: 

(  (grammar  rulenames) 

'  (predefined  rulenames)  > 


The  Template  Grammar  produces  operator  and  rulename 


pairs*  both  bound  and  free*  punctuated  by  the  terminal  sym¬ 
bols  "("*  "?"*  "*M  and  ")"  which  are  interpreted  as  follows; 


"C":  Create  a  child  node  under  the  current  node*  make 
the  node  created  the  current  node*  and  overwrite  the  OP 
field  with  the  operator  listed  next. 

"*":  Create  a  right  sibling  of  the  current  node*  make 
the  node  created  the  current  node*  and  overwrite  the  OP 
field  with  the  operator  listed  next. 

"*":  Overwrite  the  RULE  field  of  the  current  node  with 
the  rulename  listed  next. 

•)":  Make  the  father  of  the  current  node  the  new 
current  node. 


The  first  symbol  of  every  template  is  an  operator*  ei¬ 
ther  free  or  bound*  which  overwrites  the  OP  field  of  the 
current  node.  The  current  node  is  the  only  node  in  the  AST 
which  is  modified  in  any  way  by  a  template*  new  nodes  may 
be  created*  but  always  within  the  context  of  the  current 
node. 

The  templates  defined  by  this  grammar  allow  definition 
of  the  t ransf ormat i ons  in  Chapter  III.  The  following  exam¬ 
ples  illustrate  the  various  constructions  most  commonly  en¬ 


countered 


1.  Single  node  replacement#  rule  field  unchanged 


Transformation: 

NT # a  =>  ALT # a 
Tempi  ate: 

ALT  #  a 

2.  Single  node  replacement#  operator  and  rulename  modified 
T  ransf ormat i on : 

ALT#a  s>  NT, r 
Tempi  ate: 

NT ,  r 

3.  Replacement  with  sibling  string: 

Transf ormat i on : 

IOPT# i  s>  C0PT,r2  NT , r  1  rOPT,i 
Temol ate: 

COPT #  r2  ,*  NT,  rt  ?  IOPT#  t 

4.  Replacement  with  subtree: 

T  ransf ormat i on: 

NT  #  c  =>  NT  #  r 1  COPT  #  r2  NT,r3 
Template: 

HEAD # c  (  NT , r  1  !  COPT # r2  J  NT , r 3  ) 
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appendix  d:  intermediate-level  language  definition  GRAMMAR 


ILD: 

ru I el  i st : 
rule: 


c-rule: 

c-rul e-a: 

cdef-a : 
defoart : 
opt  ion: 
ct  la: 
headop: 
head: 
pdf : 

free  list: 
f reenode: 
f reeop: 
nt : 
copt : 
csla: 


1 angname  rule!  i st  (extensions], 

♦  rule. 

<  c-rule 
!  a-ru'e 
!  i -ru 1 e 
!  1 -rule  >  . 

{  c-rule-a 
!  c-rule-P  >. 
c-rulename  ":"  cdef-a 

"=>"  ctla  "  =  >"  csla, 

♦  defpart. 

f  rulename  {  option  !  terminal  ). 
"  l"  rulename  "1  ", 
headop  "("  freelist 
{  head  {  pdf  >• 

"HEAD". 

(  (predefined  functions)  >. 

♦  freenode  ...  . 

freeop  "#"  rulename. 

(  nt  !  copt  >. 

"NT". 

"COPT". 

♦  dispart. 


1  a  3 


di spar t 


(  subtree  !  literal  !  format  > 


suot  ree : 
oodi s  f I d: 
ootodf : 
pdf  odf : 
undodf : 
c-ru l e-b : 

cdef-b: 
ct  lb: 
cslb: 
termpart : 
a-rul e: 

adef ; 
al 1 1 i st : 
alt: 
at  1 : 
at<? : 

al t-temp: 
al t-t : 
as  1 : 
as2: 

al t-di sp: 
al t-d: 
i -rul e: 


integer  loodisfldl. 

(  optodf  !  pdf odf  !  undodf  >. 

"  =  "" ["  rulename  "1  """. 

"=•"<"  rulename  ">""". 

"  =  ""("  rulename 
c-rulename  ":"  cdef-b 

"=>"  ctlb  "  =  >"  cslb. 
terminal . 

"HEAD # "  c-rulename. 

*  termpart. 

(  1 i teral  I  format  > . 
a-rulename  adef 

"=>"  at  1  "=>"  at 2  "  =  >"  as  1  "  = 
al tl 1st  ">". 

*  alt  "!"  ...  . 

altchar  rulename. 

"ALT#"  a-rulename. 

" < "  al t-temp  "> " . 

»  alt-t  "}"  ...  . 
altchar  ":  NT#"  rulename. 

"{"  a-rulename  ">". 

"{"  a l t -di sp  "1". 

*  alt-d  "!"  ...  . 

altchar  rulename. 

i-rulename  idef 
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>"  as<>. 


"=>"  itl 


=  >H  i 1 2  " => 


isl  "=>"  i s2 


i  def : 
itl: 
i  t2: 
i  s  1 : 
i  s2: 

1 -ru 1 e : 


1 -rul e-a: 

1  def-a : 

1  t2a: 

)  s2as 

1 -rul e-b: 

l de*-b: 

1  t2b: 

1  s2b: 

1 -rul e-c : 

l def-c : 

I  t2c : 

1  s2c : 


"  rul ename 1 . 

"ITER  (  NT,"  ru 1 ename  1  "J  IGPT>"  i-rulename  ")". 
"NT,"  ru 1 ename 1  "!  IOPT,"  i-rulename. 

"$1". 

"I"  i -rul ename  "1 "  . 

{  1-rule-a 
!  1-rule-b 
!  1 -ru 1 e-c  > . 

1-rulename  ":"  ldef-a 

"=>"  Itl  "=>"  1t2a  "=>"  1st  "=>"  1 s2a  "=>"  ls3. 
"o"  rulenamel  rulename2 
"NT,"  rul ename2  ";  NT,"  rulenamel 
";  LOPT,"  1-rulename. 

"S1S2". 

1-rulename  ":"  ldef-b 

"=>"  Itl  "=>"  1t2b  "=>"  Isl  "s>"  1 s2b  "=>"  lsi. 
"®"  rulenamel  "l"  rulename2  "...". 

"COPT,"  rul ename2  ",  NT,"  rulenamel 
LOPT,"  I-rulename. 

"Sl=["  ru1ename2  "1$2". 

1-rulename  "s"  ldef-c 

■  =  >“  jd  "s>"  1t2c  "  =  >"  Isl  "=>"  1  s2c  "  =  > "  lsi. 
rulenamel  terminal  "...". 

"NT,"  rulenamel  ";  LOPT,"  1-rulename. 
terminal  "SI". 
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. mm . 


1 t l :  "LIST  (  NT,"  rulenamel  "i  LQPT,"  )-rutename 

1 s 1 :  "SI". 

Is3:  "l"  1-rulename  "1". 

format:  (  newline  !  tab  {  untab  >. 

newl i ne:  "NL"  . 

tab:  "TB". 

untab:  "UT". 

extensions:  userpdr  userpdf. 
userpdr:  (undefined)  . 

userpdf:  (undefined)  . 


APPENDIX  £:  ILO  GRAMMAR  LANGUAGE  DEFINITION 


ILD:  langname  rulsUst  [extensions] 

=>  ILD, ILD 

(NT  f St  ring! 

NT , pu 1 e l i st ; 

COPT , ex  tensi ons ) 

s>  S ls"<l angname>"  $2  $3*" t ex t ens i ons J "  . 

ruleHst:  ♦  rule 

->  ITER, rulel i st 
(NT, rul e; 

IOPT , rul el i st ) 
s>  NT, rule? 

IOPT , rul el i st 
s>  $1 

s>  "trulelistl"  . 
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r  u  1  e 


{  c-rule 


!  a*ru'« 

j 

!  i -ru 1 e 
!  1 -ru 1 e  > 

s>  AL  T / ru 1 e 
=>  {  c : NT , c-ru 1 e 

{  a : NT, a-ru 1 e 
!  i:NT,i-ru1« 

1  l : NT 1 1 -rul e  > 

=>  " { ru 1 a) " 

s>  "{  cic-rule  !  a:a-ruJe  !  ili-rule  I  J:l-ru)e  >M 

c-rule:  {  c-rule-a 

!  c-rule-o  > 
s>  ALT,c-cule 
=>  {  a:c-rule-a 

!  b:c-rule-h  > 

=>  "<c-rule>" 

=>  " (  a:c-rule-a  !  o:c-rule-b  >"  . 
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c*pu 1 e-a 


c-ru 1 ename 


n  #  m 


cde  f-a 


"=>"  ctla  "=>"  csla 
=>  HE  AO ,  ctu  1  e-a 
(NT , S t  p i ng; 

NT ,cdef-a» 

NT, ctla* 

NT , c  s 1  a ) 

=>  $l=-<c-pu!ename>"  $2  "  =  >"  S3  "s>-  SU 

cdef*a:  ♦  defpart 

->  lTER,cdef-a 

(NT , defpart ; 

IOPT ,edef -a] 
s>  NT, defpart; 

IOPT,cdef-a 

=>  SI 

=  >  "  [defpart] "  . 

defpart:  <  rulename  !  option  !  terminal  > 

=>  ALT, defpart 
s>  {  r:NT, String 
!  o:NT, option 
i  t :NT, t ermi na 1  > 

->  "<defpart>" 

=>  •<  rsrulename  !  oroption  I  t:terminal  I" 


oot i on : 

"l"  rul  ename  "1  H 

:  > 

HEAD, opt i on 

( NT , St  r i ng) 

s> 

M"  $  1  s" <ru  1  ename> "  "1"  . 

ct  la: 

headop  "(*  freelist  ")" 

:  > 

HEAD , c  t la 

(NT , headop; 

i'iT ,  f  reel  i  st ) 

;  > 

$1  "(-  S 2  •)"  . 

headoo: 

(  head  i  pdf  > 

:> 

ALT , headop 

:> 

(  h:NT,head 

i  p: NT, pdf) 

s> 

" (headop  > " 

s  > 

-{  h : HEAD  !  p:pdf  >"  . 

head: 

"HEAD" 

s> 

HEAO , head 

s> 

"HEAD"  . 

pdf : 

<  (predefined  functions) 

=>  ALT, pdf 


f  ree list:  #  f  reenode  *;* 
s>  L I  ST , f ree list 
(NT,  f  reenode  ? 


LOPT , free  list) 
s>  NT, f reenode; 

LOPT , free! i st 
=>  SI 
=  >  SI 

s>  " f f  reenode) "  . 

f reenode:  freeop  rulename 
=>  HEAD, f reenode 
(NT , f  reeop; 

NT, St  r i ng) 


s> 

SI  S2="<ru l ename>" 

f  reeoo: 

(  nt  {  copt  > 

:> 

AL  T , f  reeop 

s> 

(  n : NT, nt 

•  CJNT,copt  ) 

s> 

" ( f  reeop) " 

s> 

"<  n;NT  !  c:C0PT  )-  . 

nt : 

"NT" 

s> 

HEAO,nt 

s> 

"NT"  . 

15! 


COPt 


COPT" 


=>  HEAD#coot 
=>  "COPT"  . 

csla:  ♦  dispart 

=>  ITER, csla 

(NT/di spart ; 

IOPT ,csla) 
s>  NT, dispart; 

IOPT,csla 
=  >  SI 

s>  "  Idi spart 1 "  . 

dispart:  {  subtree  !  literal  !  format  > 

=>  ALT, dispart 
=>  {  s:NT, subtree 

i  1 :NT, li feral 
!  f:NT, format  > 

=  >  "(dispart)" 

=>  "(  s:subtree  !  liliteral  5  fsformat  >"  . 

subtree:  integer  (oodisfld) 

s>  HEAD, subtree 
(NT, Integer; 

COPT ,opdi sf 1 d) 

s>  "j"  S 1 s"<i nt eger>"  S2  =  " fopdi sf Id) "  . 
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oodisfld:  <  optodf  !  pdfoaf  !  undodf  } 


s> 

:> 


s> 

:  > 


ootodf : 

s> 


s> 


pdfadf: 

s> 


=  > 


undodf : 

s> 


=  > 


ALT , oodi s  f 1 d 
{  o:NT,optodf 
I  p:NT,pdfodf 
J  utNT, undodf  > 

"(oodisfld)" 

"(  osoptodf  J  ptpdfodf  !  uJundodf  >" 

"=""("  rulensme  "]""" 

HEAD, optodf 
(NT , St  r i ng) 

"  =  l"  J 1  =  "<pu 1 *name>"  , 

rulename  ">""" 

HEAD, pdf odf 
(NT, St  r i ng) 

"=""<"  Sl="<rulename>"  ">"""  . 

"=""("  pul ename  *)""" 

HEAD , undodf 
(NT , St  Pi ng) 

$l-"<pulename>"  ")"""  . 


c -ru 1 e* 


s> 


;> 


cdef -b : 
;> 


s> 


ct  lbs 
=> 


s> 


cslb: 

s> 


b;  c-fulename  cdef“b 

"s>“  ctlb  M=>"  c 
HEAD/C«ru1  e«*b 
(NT , St  r i ng; 

NT  #cdef-b? 

NT,ct lo; 

NT/ cs 1 b) 

St  =**<c-ru1  ename>"  S2 

termi na) 

HEAO/Cdef-b 
(NT/terminal ) 

St  . 

"HEAD/"  c-rulename 
HEAD/Ct lb 
(NT /String) 

■HEAO,"  S 1 *"<c-ru 1 ename> 

♦  termpart 
ITER/CSlb 
(NT/ termpart ; 

IOPT  * cslb) 

NT / termpart  J 
IOPT/calb 


=  >  St 

s>  " (termpart)  " 


termpart:  {  literal  ■  format  > 

=>  ALT, termpart 
=>  {  1 : NT , 1 i t  era  1 

{  f :  NT , format  > 

->  "{termpart)" 


->  "(  Izliteral  !  flformat  > 


a-rulei  a*ru1ename  adef 


•s>«  at!  "=>"  at 2  "=>"  ast  "=>"  as2 


=>  NEAD,a-rule 
(NT , St  r i ng; 

NT , adef ; 

NT  f  at  1 ? 

NT  , at  2 » 

NT, asl ; 

NT , as2 ) 

s>  Sl="<a-rul ename>"  ":"  $2 


"=>"  S3  "s>"  S4  "=>"  S5  "=>"  So  . 


adef : 


"{"  altlist  ■>" 


s>  HEAD, adef 

(NT,al  t  Hat) 


=>  ■<"  altlist  ">"  . 
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alt  1 


alt: 


at  1 : 


at2: 


ist:  *  alt  "i"  ... 

=  >  LIST  waitlist 
(NT , a  1 1 ; 

LOPT,altHst) 
s>  NT, alt? 

LOPT ,al t li st 
=>  SI 
=  >  SI 

=>  "  Cal  1 1 1  st)  "  . 

altchar  rulename 

=  >  HEAO , a  1 1 

(NT , Character; 

NT , St  r i ng) 

s>  $1 s"<al tchar>"  ":"  S2="«ru I ename>"  . 

"ALT,"  a-rulename 
=  >  HEAD , at  1 

(NT , String) 

>  "ALT,"  S 1 s*<a-ru 1 ename>"  . 

"("  alt-temp  "}" 

>  HEA0,at2 

(NT ,al t-temp) 

>  •{"  SI  ">"  . 
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al  t-temp 


#  a  1 1  -t 


m  i  h 


=>  LI3T,al t-temp 
(NT , al t-f ; 

LOPT , a l t -temp) 
=  >  NT / a  I t -t ; 

LOPT  ,a1 t-temp 
=>  St 
=>  SI 

=>  "(alftr  . 


al t-ai  so;  #  at  t-d  "  ... 

=  >  LIST ,at tdi so 
(NT, al t-d; 

LOPT , a) t-di sp) 
s>  NT,alt-d; 

LOPT ,al t-di sp 
=>  SI 
=  >  SI 

->  " t a 1 t-di sp! "  . 

al t-d:  altchar  rulename 

=>  HEAD,al t-d 

(NT , Character; 

NT, St  r i ng) 

s>  *l-*<altchar>-  S2="<ru 1 ename>"  . 

i -rule:  i-rulename  idef 

"s>"  "5>"  it2  "=>*  is!  •=>"  is2 

=>  HEAO, i -rul e 
(NT, String; 

NT,  idef ; 

nt, itl; 

NT,it2; 

NT , i *1 ; 

NT , i $2 ) 

s>  S 1 s" < i -ru 1 ename> "  S2 

"=>"  *3  "=>"  sa  "=>-  $5  »=>-  $b  . 

t5a 


i  def 


♦  "  rulenane! 


i  1 1 : 


it2: 


isl 


i  »2 


s>  HEAD , i de  f 

(NT, St  r  i ng) 

s>  Sl="<rulenamel>"  . 

"ITER  (  NT,"  rulanamel  "?  IOPT,"  i-ru)enam« 

=>  HEAD ,  i  1 1 

(NT , St  r i ng? 

NT , St  r i ng) 

s>  "ITER  (  NT,"  Sl="<ru1  enamel>"  "?  IUPT," 

S2s"<i -rul aname>"  . 

"NT,"  rulenamel  "?  IOPT,"  i-rulename 
s>  HEAD, f  t2 

(NT , String? 

NT, St  r i ng) 

=»  "NT,"  Sl5"<Pulenamel>"  "?  IOPT,"  S2="<i *rul ename>"  . 

"SI" 

s>  HEAD *  i  s  l 

=>  "SI*  . 

"  l"  i  -r«jl  ename  "1  " 

=>  HEAD, \ s2 

(NT , St  r i ng) 

s>  •  ("  SI s"<i -ru I «name>"  "1"  . 


1SR 


mmum  ■ft 


i 


s 

i 


1  -rul  e 


<  1-rule-a 


?  1-rule-b 
!  l-rule-c  > 

=>  ALT #  1 -rul e 
=>  {  a:NT, 1 -rul e-a 

!  b:NT# 1 -rul e-b 
i  c :NT # 1 -rul e-c  > 
s>  • { l -rul e) " 

s>  "<  a:l-ru1e-a  S  b:l-ru1e-b  !  c:l-rule-c  > 

1-rule-a:  1-rulename  1<ief-a 

"=>"  Itl  "=>"  lt2a  "=>"  1 s 1  "=>M  1*2 
=>  HEAD# 1 -rul e-a 
(NT#  St  r i ng; 

NT  # l def-a: 

NT#ltl; 

NT# 1 t2a» 

NT »  ?  s  1 » 

NT , 1 s2a# 

NT » 1 «3) 

a>  Sl5*<l-rulename>"  % 2 

"=>"  $3  "*>"  S«  "  =  »"  $5  "=>"  S b  "=>" 


1  faO 


I  def -a 


"•"  rulenamel  rulename2 


M 


=  >  HEAD# 1 def-a 
(NT, String? 

NT, St  r i ng) 

s>  "#"  Sls"<rul enamel>"  S2  =  "  <ru1  enamel*  "...H  . 

It2a:  "NT,"  ru1ename2  ";  NT,"  rulenamel 

";  LOPT,"  I -rul enawe 
=>  rtEAO, I t2a 

(NT, St  r i ngf 
NT, St  r i ng; 

NT, St  ri ng) 

s>  "NT,"  S 1 s" <ru 1 ena«e2>"  ";  NT,"  S2="<ru I  enamel >" 
";  LOPT,"  S3s"<1-rulename>"  . 


1  s2a:  "S1S2" 

s>  HEAO, 1 s2a 
=>  "S1S2"  , 


l-rule*b:  1-pu1*name  ":"  ldef*b 

"s>"  |ti  "  =  >"  1  1 2b  "  =  >"  1  s  1  "=>"  1  s2b  "  =  >"  (si 
=  >  HEAD , 1 »ru 1 e«b 
(NT , St  p i ng; 

NT, 1def-b; 

NT,1tl; 

NT, 1 t2b? 

nt, i si  ; 

NT, 1 s2b? 

NT, ls3) 

s>  s 1 s" < 1 «pu 1 ename> "  $2 

"  =  >"  S3  "=>"  $4  "  =  >"  S5  "s>»  $t>  "=>"  $ 7  . 

Jdef*b:  "#"  pulenamel  "l"  rulename2  "J " 

=>  HEAD,ldef-b 
(NT, 3t  ri ngj 
COPT , St  pi ng) 

=>  •«»  SI s"<pul enamel >"  "  ("  S2="<ru) ename2>"  "J ■  . 

It2b:  "COPT,"  py1ename2  ";  NT,"  pulenamel 

";  LOPT,"  l-pulaname 
=  >  HEAO , 1 1 2b 

(COPT , St  p i ng; 

NT, St  Pi ng; 

NT, St  Pi ng) 

s>  "COPT,"  SI s"<pu1 ename2>"  ",  NT,"  S2s"<pu J ename 1 >" 

",*  LOPT,"  S3  =  "<J-Pu)ename>"  . 
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I  s2b 


"Sl  =  r  pu  1  ename2  “1  $2 


z  > 


:  > 


1 -rul e- 


:  > 


=  > 


1 def-e : 
s> 


HEAO , 1 s2b 
(NT, St  r i ng) 

" $  1 s  l"  Sls"<ru! ename2>"  "1 $2"  . 

c:  1-rulename  Idef-c 

"=>"  l 1 1  "  =  >"  1 1 2c  "=>"  1 s 1  "  =  >M  1 
HEAO, 1 -rul e-c 
(NT, St  r i ng, 

NT, 1 def-c  ? 

NT,1tlJ 
NT, 1 t2c? 

NT, 1  si  ; 

NT , 1 s2c  » 

NT, 1 s3) 

Sl="<1-rulename>"  S2 

"=>"  S3  "  =  >"  sa  "s>"  $5  "=>"  So  " 

rulenamel  terminal 
HEAD, ldef-c 
(NT, String; 

NT , t ermi na 1 ) 


1  o  3 


->  ■#"  $  1  s " <ru  1  ename  1  > "  S2 


1  tic 


NT,"  rulenamel 


LOPT, 


1  -  r  u  1  e  n m  e 


ME AO  # 1 t  2c 
( NT  #  S t  r i ng? 

NT  #  St  p i ng) 

"NT,"  S 1 =" <ru 1 ename ! > "  "LOPT,"  S2  =  “ < 1 - ru l ename>"  , 

I  s 2c :  termi na I  "SI " 

=>  HEAD , 1 s2c 

(NT , terminal  ) 

=>  $l="<terroi nal >"  "11"  . 

Itl:  "LIST  (  NT,"  rulenamel  ";  LOPT,"  l-ru)ename  ")" 

=>  HEAD, Itl 

(NT, String? 

NT , St  r i ng) 

=>  "LIST  (  NT,"  $l="<rulenamel>"  "?  LOPT," 

S2s"<l -rul ename>"  ")"  . 

1  s  1 :  "SI" 

s>  HEAD , 1 s  1 
a.  "Si"  . 

Isi:  "I"  1-rulename  "J " 

s>  HEAD , 1 s3 

( 1 *rul «name) 

s>  "J"  SI *•<! -rul ename>*  "1"  . 
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terminal 


1 i teral 


:  > 

:  > 

I i t  era l 
=> 

s> 

format : 
s> 

=> 

=  > 

=  > 

new) i ne 
:  > 

s> 

tao: 

:  > 

s> 


Head/ termi na I 
(NT/String) 

•"""  Sl="<terminal >" 

1 i t  era  1 
Head/ 1 i teral 
(NT / St  r i ng) 

•"""  $l="<literal>" 

{  new  line  S  tab  ! 
ALT / format 
{  n:NT /newl i ne 
!  t :NT, tab 
I  u:NT/untab  > 

• ( format  > " 

" {  n ;new line  !  t : t  ab 

"NL" 

HEAD/ newl i ne 
"NL"  . 

"TB" 

HEAD/tab 
•TB"  . 


it  m  * 

untab  > 


!  utuntab  >" 


1&5 


untab:  "UT" 

s>  HEAD/untab 
=>  *UT"  . 

extensions:  userodf  userpdf 
s>  HEAD/ extensions 
(NT /userodr; 

NT / userodf ) 

=>  SI  S2  . 

userpdr:  (undefined)  . 

userpdf:  (undefined)  . 

String,  Integer/  and  Character  are  system  predefined  rules 


APPENOIX  F:  MEMORANDUM  LANGUAGE  DEFINITION 


The  following  Language  Definition/  constructed  by  hana, 
illustrates  the  temolates  and  schemas  required  for  the  de¬ 
finition  of  a  simole  grammar.  When  realized  as  an  AST  via 
the  ILD  Grammar  Directed  Editor  and  interpreted  by  the  sys¬ 
tem  predefined  function  ILD,  this  Language  Definition  could 
be  installed  in  the  Language  Definition  Modu 1 e  as  oart  of  a 
Memorandum  GDE. 


memo: (sal utat i on]  body  (closing] 
s>ILD, memo 

(COPT, salutation; 

NT, body; 

COPT ,c 1 osi ng) 

=  >NL  Sl=" (salutation] "  $2  NL  TB  TB  TB  S3s"  (c 1  os i nq] " . 

sal utat i on : "Dear "  name 
s>HEAD, salutation 
(NT , St  r i ng) 

s>"Dear"  $l="<name>"  . 

body:  ♦  paragraph 
=>lTER,body 


(NT  #paragraph ; 
I0PT ,bodv ) 


s>NT»  paragraph ; 

IOPT , body 
*>NL  T8  UT  Si 
s>NL  " [paragraph] "  . 

paragraph:  +  lines 

s>ITER, paragraph 
(NT»St  r< ng; 

IOPT  ^paragraph ) 

=  >NT , St  r i ng? 

IOPT, paragraph 
=>S1="<1 ine>"  NL 
=  >■  U inel ”  NL  . 

closing:"3incerely,"  name 
=>HEAD, closing 
(NT ,St  ri nq) 

s>"Si ncere I y » "  NL  Sl="<name>" 

String  is  a  system  predefined  rule 


A^PE*'UTa  b  :  3YSTtM  PnPoPF  T.lFO  F 'IbC  F  TU'!o 


The  following  is  a  list  of  nrogrd*ipim  I  annua^e  u  r  i  "  '  - 
tive  operations,  derived  in  nart  fro™  r  r  r  a  t  t  ,  1  Q  7  f  J  , 
coulo  oe  i  mr  I  «m»nt  ert  as  System  Dreaefineu  Functmns.  \  *  i  s 
list  is  not  intenaea  as  a  co'crehe^s  i  v«*  c  o  1  1  *c  t  i  on  o*  ^  r"“ 
primitives  desire^/  or  even  required,  tor  i-c'e,"“l’t^t'0"  o* 
a  uOE  system.  Rather,  these  functions  ar*  Presented  he^e  a' 
an  indication  o*  the  classes  of  operations  *h,Ch  mnnt  ce 
maae  available  in  supoort  of  users  of  t*e  SOt  . 

Synthesis  operators 
1  .  NT 

2.  COPT 

3.  10PT 
<4.  LOPT 
5.  AIT 
b. 

7  .  r)F  AD 
<J.  ITtR 
9  .  L  T  ST 

Arithmetic  Operators 

10.  PLUS 

11.  MTNUS 

mu  1 t i o 1  i c  a  t i on 

1oR  % 


\i.  M"l 


13. 

DIV 

division 

19. 

REM 

rents  i  nder 

15. 

UPLUS 

unary  plus 

lb. 

UMINUS 

unary  minus 

Relational  Operators 

17. 

equal 

equa 1 i ty 

IS. 

NTEQ 

not  eaual 

19. 

GT 

greater  than 

20. 

LT 

less  than 

21. 

GTE 

greater  than 

or  equal 

22. 

LTE 

less  than  or 

equa  1 

Bool ean 

Operators 

23. 

ANO 

24. 

OR 

25. 

NOT 

Assignment  Operators 

26. 

ASNA 

arithmetic  assignment 

27. 

ASNS 

string  assignment 

Sequence  Control 

Operators 

28. 

CONO 

i f-then®el se 

c  ondi t i onal 

29.  LOOP 


general  1  zed  loop 


Symbol  Table  and  Data  Element  Operators 
32.  DECLARE  declaration 

S3.  SLOCK 

34.  IOENT  identifier 

35 .  NUMBER 

30.  STRING 

System  Operators 

37.  ILO  AST  to  Language  Definition  tran 

Mi  seel  I aneous 


station 


38.  NOP 


nu 1 1  operat i on 


c°:“I£“T!?!!!.n  .  ..  *  <  <•*  i  "  frit"! "  !  tk  ) 

<ptc>  if  s  rK 

<C>  “  copt(rk)  if  <i  »  "  l"rk“] " 

copt ( r )  s>  <r> 

ALTERNATION:  „v* 

a  :  r 1  r2  -{“  ...  5  rn  "> 

<a>  s>  {  <r 1 >  !  <r2>  l  •••  *  <rn>  > 

iteration: 
i  :  r 


<  i  > 

=  » 

<r>  iopt(i) 

i opt ( i )  s> 

<r>  i opt ( i  ) 

'  • 

rl  *  H. 

,  x  =  (  r2  M"r2"l 

•  \  t  > 

<1> 

s> 

<r 1>  1  opt ( 1  ) 

J  opt ( 1 )  => 

<r2>  <rl>  loptd) 
copt ( r2)  <rl>  lopttl) 

<r 1>  1  opt  C 1  ) 

i f  *  s  r2 
if  x  a  "l 
i  f  x  *  t 

PREDEFINED: 
p  :  Pdf 

<p>  =>  pdf (p) 

UNDEFINED: 

<u>  s>  <u> 


{  concatenat i on  rules  > 

{  alternation  rules  > 

{  iteration  rules  > 

{  list  rules  > 

<  predefined  rules  > 

{  undefined  rules  > 

<  C,A#I«L»P*U  > 

{  terminal  symbols  1 

Figure  2.  Transformations 
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CONCATENATION: 

c  :"xl  xE  ...  *n  ,  *K  s  <  rk  •  •["rk")"  J  tk  > 

NT  »  r k  ifxkspk 

NT,c  => 

COPT, rk  if  *k  :  "  ["rk"J " 

COPT , r  =>  NT  »  r 

alternation: 


a  :  "<"  r 1 

M  «  N 

1 

r2  "!" 

n  i  h 

•  •  •  f 

pn  "*  “ 

NT ,  a 

=  > 

{  NT, 

ri  :  nt. 

r2  {  ...  !  NT,rn  > 

I TEKATION: 
i  :  p 

NT,  i 

:> 

NT,  r 

IOPT,  i 

IOPT ,  i 

s> 

NT ,  r 

IOPT,  i 

list: 

1  :  r 1 

x  "  . 

x  =  < 

ra  {  -CpE-J"  !  t 

NT,  1 

s> 

NT ,  r  1 

LOPT,  1 

NT , r2  NT, Pl  L0PT,1  if  x  =  r2 
LOPT ,  1  3>  COPT »  r<2  NT ,  p  1  LOPT, l  if  x  =  "fra"]" 

NT , pi  LOP T , 1  if  x  =  t 

PREDEFINED: 
p  :  pdf 

NT ,p  =>  PDF (o) ,p 

UNOEFINED: 

NT , u  *>  NT , u 


c 

i  n 

C 

r 

< 

concatenat i on  rules  > 

a 

i  n 

A 

- 

< 

alternation  rules  > 

i 

i  n 

I 

• 

< 

i  terat ion  rules  ) 

1 

i  n 

L 

s 

{ 

list  rules  > 

P 

i  n 

P 

s 

( 

predefined  rules  T 

u 

i  n 

U 

s 

( 

undefined  rules  > 

r 

i  n 

R 

s 

i 

C,A, I,L,P»U  T 

t 

i  n 

T 

s 

{ 

terminal  symbols  i 

Figure  3.  Labelled  Transf ormat i ons 
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t 

I 


concatenation: 

c  :  xl  x2  xn  ,  x  k  = 

NT  ,  rk 

NT,c  => 

COP  f »  rk 

COPT ,  r  ->  NT , r 


{  rk  !  "["rk"]"  !  tk  > 
if  xk  a  rk 
if  xk  =  " ["rk"] " 


ALTERNATION: 

a  :  "<"  rl  •{"  r2  "5“  ...  rn  ">" 

N  T ,  a  — >  A L T  f a 

ALT, a  =>  {  NT,rl  !  N  T ,  r  2  !  ...  \  NT, rn  > 

ITERATION: 
i  :  r 


NT »  i  => 

IOPT,i  => 

LIST: 

1  :  rl  X  ".. 

NT , 1  => 


NT ,  r  IOPT,i 
NT ,  r  IOPT,i 

"  ,  x  =  <  r2 

NT »  r  1  LOPT  ,  1 


LOPT , 1  => 


NT,r2  NT ,  r  1  LOPT, l  if  x 
COPT , r2  NT#  r  1  LOPT, I  if  x 
N  T , r 1  LOP  T , 1  if  * 


r2 

" C"r2"J  " 
t 


PREDEFINED: 
p  :  pdf 


N  T , p  =>  TERH/p 

TERM, p  =>  PDF (p) ,p 

UNDEFINED: 


NT,  u 


=>  NT,u 


c  i  n  C 
a  i  n  A 
i  ini 
1  in  L 
p  i  n  P 
u  i  n  U 
r  i  n  R 
t  i  n  T 


i  concatenation  rules  > 
{  alternation  rules  } 

I  iteration  rules  > 

<  list  rules  > 

{  predefined  rules  } 

{  undefined  rules  ) 

{  C,A,I,L,P,U  > 

(  terminal  symools  > 


Figure  Extended  Trans f ormat i ons 
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I  GRAMMAR-OHIVEN  ENVIRONMENT 


Figure  5.  System  Architecture  (Data  Flow) 
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